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Preface 


Originally, functional analysis was the study of functions. It is now considered to 
be a unifying subject that generalizes much of linear algebra and real/complex 
analysis, with emphasis on infinite dimensional spaces. This book introduces this 
vast topic from these elementary preliminaries and develops both the abstract 
theory and its applications in three parts: (1) Metric Spaces, (ID) Banach and Hilbert 
Spaces, and (III) Banach Algebras. 

Especially with the digital revolution at the turn of the millennium, Hilbert 
spaces and least squares approximation have become necessary and fundamental 
topics for a mathematical education, not only just for mathematicians, but also for 
engineers, physicists, and statisticians interested in signal processing, data analy- 
sis, regression, quantum mechanics, etc. Banach spaces, in particular L' and L® 
methods, have gained popularity in applications and are complementing or even 
supplanting the classical least squares approach to many optimization problems. 


Aim of this Book 


The main aim of this book is to provide the reader with an introductory textbook 
that starts from elementary linear algebra and real analysis and develops the theory 
sufficiently to understand how various applications, including least squares 
approximation, etc., are all part of a single framework. A textbook must try to 
achieve a balance between rigor and understanding: not being too elementary by 
omitting ‘hard’ proofs, but neither too advanced by using too strict a language for 
the average reader and treating theorems as mere stepping stones to yet other 
theorems. Despite the multitude of books in this area, there is still a perceived gap 
in learning difficulty between undergraduate and graduate textbooks. This book 
aims to be in the middle: it covers much material and has many exercises of 
varying difficulty, yet the emphasis is for the student to remember the theory 
clearly using intuitive language. For example, real analysis is redeveloped from 
the broader picture of metric spaces (including a construction of the real number 
space), rather than through the even more abstract topological spaces. 


Vii 


Vili Preface 
Audience 


This book is meant for the undergraduate who is interested in mathematical 
analysis and its applications, or the research engineer/statistician who would like a 
more rigorous approach to fundamental mathematical concepts and techniques. It 
can also serve as a reference or for self-study of a subject that occupies a central 
place in modern mathematics, opening up many avenues for further study. 

The basic requirements are mainly the introductory topics of mathematics: Set 
and Logic notation, Vector Spaces, and Real Analysis (calculus). Apart from these, 
it would be helpful, but not necessary, to have taken elementary courses in Fourier 
Series, Lebesgue Integration, and Complex Analysis. Reviews of Vector Spaces 
and Measurable sets are included in this book, while the other two mentioned 
subjects are developed only to the extent needed. 

Examples are included from many areas of mathematics that touch upon 
functional analysis. It would be helpful at the appropriate places, for the reader to 
have encountered these other subjects, but this is not essential. The aim is to make 
connections and describe them from the viewpoint of functional analysis. With the 
modern facilities of searching over the Internet, anyone interested in following up 
a specific topic can easily do so. 

The sections follow each other in a linear fashion, with the three parts fitting 
into three one-semester courses, although Part II is twice as long as the others. The 
following sections may be omitted without much effect on subsequent topics: 


Section 6.4 C(X, Y) 

Section 9.2 Function Spaces 

Sections 11.5 Pointwise and Weak Convergence 

Sections 12.1 and 12.2 Differentiation and Integration 

Sections 14.4 and 14.5 Functional Calculus and the Gelfand Transform 
Section 15.4 Representation Theorems. 
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Chapter 1 
Introduction 


Much of modern mathematics depends upon extending the finite to the infinite. 
In this regard, imagine extending the geometric vectors that we are familiar with to 
an infinite number of components. That is, consider 


v = aye; + a2e2 + --- = (a1, 42, a3, ...) 


where e; are unit independent vectors just like i, j and k in Cartesian geometry. It is 
not at all clear that we can do so—for starters, what do those three dots “---” on 
the right-hand side mean? Surely they signify that as more terms are taken one gets 
better approximations of v. This immediately suggests that not every such “infinite” 


vector is allowed; for example, it might be objected that the vector 
v=e;+e.+e3+--- 


cannot be approximated by a finite number of these unit vectors, as the remainder 
ev +--- looks as large as v. Instead we might allow the infinite vector 


1 1 
a a ee ll 


although even here, it is unclear whether this may also grow large, just as 


1 1 
Pe hg ee: 
To continue with our experiment, let us just say that the coefficients become zero 
rapidly enough. 
There are all sorts of things we can attempt to do with these “infinite” vectors, by 
analogy with the usual vectors: addition of vectors and multiplication by a number 
are easily accomplished, 


(34-J+Lyg-J=21L 5...) 


— 
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2x 0,1, —-4,...) = (0,2, -1,...). 
One can even generalize the “dot product” 
(a1, 42,...) > (1, b2,...) = arbi + anb2 +---, 


assuming the series converges—and we have no guarantee that it always does. 
For example, if x is equal to (1, 1/-/2, 1//3, ...), then x -x = beak 1/n is 
infinite. Again let us remedy this situation by insisting that vectors have coefficients 
that decrease to 0 fast enough. 

Having done this, we may go on to see what infinite matrices would look like. 
They would take an infinite vector and return another infinite one, as follows, 


aj| 42... x] YI 
a21 a22 x2} =] 32], 
where y) = ay1x1 +a12x2 +--+ = >), GinXn, etc. Perhaps we may need to have the 


rows of the matrix vanish sufficiently rapidly as we go down and to the right of the 
matrix. 

Once again, many familiar ideas from finite matrices seem to generalize to this 
infinite setting. Not only is it possible to add and multiply these matrices with- 
out any inherent difficulty, but methods such as Gaussian elimination can also be 
applied in principle. There seems to be no intrinsic problem to working with infinite- 
dimensional linear algebra. 

It may come as a slight surprise to the reader that in fact he/she has already met 
these infinite vectors before! When a function is expanded as a MacLaurin series 


1 
f(x) = fO)+ f’O)x+ sfx? fied y 


itis in effect written as an infinite sum of the basis vectors (or functions) 1, x, sets 
each with the numerical coefficients f(0), f’(0), 5 f’(0),..., respectively. 
Adding two functions is the same as adding the two infinite vectors (or series); 
and multiplying by a number is equivalent to multiplying each coefficient by the 
same number. What about infinite matrices? Take a look at the following form of 
differentiation, here written in matrix form, 


010 ... f (0) 
020 f'O) 
FO=) 03 5f"(0) 
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And just as there are various bases that can be used in geometry, so there are 
different ways to expand functions, the most celebrated being the Fourier series 


f(x) = ao + a) cosx + by sinx + a2 cos 2x + bz sin2x +--+. 


The basis vectors are now 1, cos x, sin x, cos 2x, etc. What matrix does differentia- 
tion take with respect to this basis? 


If we accept that all this is possible and makes sense, we are suddenly made 
aware of a new unification of mathematics: certain differential equations are matrix 
equations, the Fourier and Laplace transforms can be thought of as generalized 
“matrices” mapping a function (vector) to another function, etc. Solving a linear 
differential equation, and finding the inverse Fourier transform, are equivalent to 
finding the inverse of their “matrices”. 

Do we gain anything by converting to a matrix picture? Apart from the practical 
matter that there are many known algorithms that deal with matrices, a deeper reason 
is that linear algebra and geometry give insights to the subject of functions that we 
may not have had before. Euclid’s theorems may possibly still be valid for functions 
if we think of them as ‘points’ in an infinite-dimensional vector space. We wake up 
to the possibility of a function being perpendicular to another, for example, and that 
a function may have a closest function in a “plane” of functions. 

Conversely, ideas from classical analysis may be transferred to linear algebra. 
Since square matrices can be multiplied with themselves, can the geometric series 
1+A+ A? +.--- make sense for matrices? Perhaps one can take the exponential 
of a matrix e4: = 1+ A+ A?/2!+ A?/3!+---. There’s no better way than to take 
the plunge and try it out, say on the differentiation ‘matrix’ D, 


eP f(x) = (1+ D4+D°/24+--)f@) = fO)+ fit f"@/2+-= fa+D 


(by a Taylor expansion around x). The “matrix” e? certainly has meaning: it per- 
forms an unexpected, if mundane, operation, it shifts the function f one step to the 
left! Again, suppose we have the equation y’ — 2y = e*; manipulating the deriv- 
ative blindly as if it were a number gives a correct solution (but not the general 
solution) 


1 
y=(D-2) 'e = —5(1+ D/2+ D?/44-++-)e* = —e. 


Yet repeating for the equation y’ — 2y = e** fails to give a meaningful solu- 
tion. 

In fact, historically, the subject of functional analysis as we know it started in 
the 19th century when mathematicians started to notice the connections between 
differential equations and matrices. For example, the equation 


y(x) = a(x) yx) + g(x) 
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can be written in equivalent form as 


i= / HSS FCA. (1.1) 


x0 

The integral i a(s)y(s) ds is an infinitesimal version of be 4yYn and can be 
thought of as a transformation of y(x). Equation (1.1) is akin to a matrix equation 
y = Ay + 5, and we are tempted to try out the solution y = (1 — A)7'b = 
(1+ A+A?4+---)b. 


Nonetheless, technical problems in carrying out this generalization arise immedi- 
ately: are the components of an infinite vector unique? They would be if the vectors e, 
are in some sense ‘perpendicular’ to each other. But what is this supposed to mean, 
say for the MacLaurin series? After all, there do exist non-zero functions whose 
MacLaurin coefficients are all zero. The question of whether the Fourier coefficients 
are unique took almost a century to answer! And extra care must be taken to handle 
infinite vectors. For example, let 


Vy, = ( 1, 5 0, 0, . .) 
vo :=(-1, , 0,0,...) 
v3 — ( 0, =), 1, 0, * .) 
v4 = ( 0, 7 -l, 1, .) 
It seems clear that 
vip tut+uy3+---=0, 


yet the size of the sum of the first n vectors never diminishes: 
vi=vypt:--+v, =(0,...,0,1,0,...) > v-v=1. 


Because of these trapfalls, we need to proceed with extra caution. It turns out that 
many of the equations written above are capable of different interpretations and so 
cannot be taken to be literally true. 

These considerations force us to consider the meaning of convergence. The reader 
may already be familiar with the real line R, in which one can speak about conver- 
gence of sequences of numbers, and continuity of functions. Some of the main results 
in real analysis are 


(i) Cauchy sequences converge, 
(ii) for continuous functions, if x, — x then f(%,) > f(x), 


(iii) continuous real functions are bounded on intervals of type [a, b] and have the 
intermediate value property. 
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We seek generalizations of these to RY and possibly to infinite dimensional spaces. 
We do seem to have an intuitive sense of what it means for vectors to converge 
X, — x, but can it be made rigorous? Is it true that if x, — x and y, > y 
then f(%, Yn) > f(x, y) when f is a continuous function? Are continuous real 
functions bounded on “rectangles” [a, b] x [c, d], and is the latter the correct analogue 
of an “interval”? Since vector functions are common in applications, it is important 
to show how these theorems apply in a much more general setting than R, and this 
can be achieved by stripping off any inessential structure, such as its order (<). As 
we proceed to answer these questions, we will see that the real line is very special 
indeed. Intervals play several roles in real analysis, roles that are distinguished apart 
in R%, where we speak instead of connected sets, balls, etc. 


The book is divided into three parts: the first considers convergence, continuity, 
and related concepts, the second part treats infinite vectors and their matrices, and 
the third part tackles infinite series of matrices and more. 

Functional analysis is a rich subject because it combines two large branches of 
mathematics: the topological branch concerns itself with convergence, continuity, 
connectivity, boundedness, etc.; the algebraic branch concerns itself with operations, 
groups, rings, vectors, etc. Problems from such different fields as matrix algebras, 
differential equations and approximation theory, can be unified in one framework. As 
in most of mathematics, there are two streams of study: the abstract theory deduces 
the general results, starting from axioms, while the concrete examples are shown 
to be part of this theory. Inevitably, the former appears elegant and powerful, and 
the latter full of detail and perhaps daunting. Nonetheless, both pedagogically and 
historically, it is often by examples that one understands the abstract, and by the 
theory that one makes headway with concrete problems. 


Most sections contain a number of worked out examples, notes, and exercises: it 
is suggested that a section is first read in full, including its propositions and exercises. 
These exercises are an essential part of the book; they should be worked out before 
moving to the next section (some hints and answers are provided in the appendix, 
and many worked solutions can be found in the book’s website) http://www.springer. 
com/mathematics/analysis/book/978-3-3 19-06727-8. To prevent the exercises from 
becoming a litany of “Show ...” and “Prove ...”, these terms have frequently been 
omitted, partly to instil an attitude of critical reading. As a guide, the notes and 
exercises have been marked as follows: 


> refers to important notes and results; 
* more advanced or difficult exercises that can be skipped on a first reading; 


side remarks that can be skipped without losing any essential ideas. 
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1.1 Preliminaries 


Familiarity with the following mathematical notions and notation is assumed: 


Logic and Sets 


The basic logical symbols are = (implies), NOT, AND, OR, as well as the quantifiers 
J (there exists) and V (for all). The reader should be familiar with the basic proof 
strategies, such as proving ¢ => w by its contrapositive (NOT w) => (NOT @), and 
proofs by contradiction. The negation of Vx @, is dx (NOT ¢,); and NOT (Ax ¢,) 
is the same as Vx (NOT @x). The symbol := is used to define the left-hand symbol 
as the right-hand expression, e.g. e := °° 9 a. 

A set consists of elements, and x € A denotes that x is an element of the set A. 
The empty set @ contains no elements, so x € © is a contradiction. 

The following sets of numbers are the foundational cornerstones of mathematics: 
the natural numbers N = {0, 1, ...}, the integers Z, the rational numbers Q, the real 
numbers R, and the complex numbers C. The induction principle applies for N, 


If ACN AND 0€A AND Vn, (n€A > n+1€A)thenA=N. 


Although variables should be quantified to make sense of statements, as in 
Va € Q, a* # 2, in practice one often takes shortcuts to avoid repeating the obvious. 
This book uses the convention that if a statement mentions variables without accom- 
panying quantifiers, say, ||x + y|| < ||x||+|ly|l, these are assumed to be Vx, Vy, etc., 
in the space under consideration. Natural numbers are usually, but not exclusively, 
denoted by the variables m,n, N,..., real numbers by a, b, ..., and complex num- 
bers by z, w,.... An unspecified X (or Y) refers to a metric space, a normed space, 
or a Banach algebra, depending on the chapter. 

Sets are often defined in terms of a property, A := {x € X : ¢,}, where X is 
a given ‘universal set’ and @, a statement about x. For example, Rt :={xeR: 
x > 0}. 

A C B denotes that A is a subset of B,ie.,x € A => x € B; A C B means 
A C B but A F B. A “non-trivial” or “proper” subset of X is one which is not 
@ or X. “Nested sets” are contained in each other as in Ay C Az C A3 C... or 
...C Ad CA). 

The complement of a set A is denoted by XA, or by A® for short; AS = A, and 
ACB Ss B& CAS. ANB and AUB are the intersection and union of two sets, 
respectively. Two sets are “disjoint” when AM B = @. De Morgan’s laws state that 
(AUB)® = ASN BS and (AN B)® = ACU B®. In general, the union and intersection 
of a number of sets are denoted by LJ; Aj and (); A; (where the range of the index 
i is understood by the context). A “cover” of A is a collection of sets { Bj : i € I} 
whose union includes A, i.e., A C L); Bj; a “partition” of X is a cover by disjoint 
subsets of X. 
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Pairs of elements are denoted by (x, y), or as (*), generalized to finite ordered 
lists (x1,..., xy). The product of two sets is the set of pairs 


XxY:={(x,y):xeX,yeY} 
in particular X* := X x X ={(x, y): x, y € X}, and by analogy 
XY s= {(x1,...,xn) 4; © X,i=1,..., N}. 


An important example is the plane R*, whose points are pairs of real numbers (called 
“coordinates’”). The unit disk is { (x, y) € ts a? " < 1}; its perimeter is the 
unit circle S' := { (x, y) € R? : x? + y* = 1}. 


Functions 


A function f: X —> Y,x t f(x), assigns, for every input x € X, a unique 
output element f(x) € Y. (It need not be an explicit procedure.) X is called the 
“domain” of f and Y its “codomain’”. Functions are also referred to as “maps” or 
“transformations”. To avoid being too pedantic, we sometimes say, for example, “the 
function x t e*” without reference to the domain and codomain, when these are 
obvious from the context. The “image” of a subset A C X, and the “pre-image” of 
a subset B C Y are 


fA:={f@eY:aeA}, f'B:={aeX: f@ eB}. 


The image of f isim f := fX. It is easy to show that for any number of sets A;, 
FLAT £4 S01) fAi 
i i i i 
f Ua=Ur’. a= fr 
i i i i 


The set of functions f : X — Y is denoted by Y*. 

Some functions can be composed together f o g(x) := f(g(x)) whenever the 
image of g lies in the domain of f. Composing with the trivial identity function 
I: X — X,x t+ x (one for each set X), has no effect, fol = f. 

The restriction of a function f: X — Y toa subset M C X is the function 
tlm: M — Y which agrees with f on M,ie., f|m(x) = f(x) whenever x € M. 
Conversely, an extension of a function is another function ri : A— Y where X C A, 
such that ¥ (x) = f(x) whenever x € X. 


The reader should be familiar with the functions x K —x, x”, |x|, forx ¢ R 
or C; (x, y) BH x+y, xy, with domain R? or C?; (x, y) & x/y for y # 0; and 
(xX1,..-,XN) -» max(x1,...,xy) for real numbers x;. In particular, the absolute 
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value function satisfies 
la+b| < |a| + |b], lal] 20, |al =0 & a=0, abl = |al|dI. 


Conjugation is the function’: C > C,a+ibt a — ib; its properties are 


er _ = 2 
Ztw=Z+wW, 7w=ZW, Z=Z, 7ZZ= |z/°. 


1 i=j : : 
O27 The exponential function 
x > e*,R > R, may be defined by e* := (4 x ; it satisfies e? = 1 and 
e*>0. 


The Kronecker delta function is dij = 


Sequences are functions x: N > X, but x() is usually written as x,, and the 
whole sequence x is referred to by (X»)nen OF (X0, X11, --.) OF even just (x,); real or 
complex-valued sequences are denoted by bold symbols, x. For example (1/2”) is the 
sequence (1, 1/2, 1/4, ...), whichis shorthand forO0 +> 1,1 +> 1/2, etc. Itis impor- 
tant to realize that (x,) is a function and not a set of values, e.g. (1, —1, 1, —1,...) 
is quite different from (—1, 1,1, 1,...) and (—1, 1, —1,1,...), even if they have 
the same set of values. The set of real-valued sequences is denoted by RN := {x: 
N — R}, and of the complex-valued sequences by CN. Functions x: Z — X are 
also sometimes called sequences and are denoted by (xn) jez. 

Polynomials (of one variable) are functions p : C > C that are a finite number of 
compositions of additions and multiplications only; every polynomial can be written 
in the standard form p(z) = adnz” +---+a1z+ a9 (aj € C, dy # O unless p = 0); 
n is called the degree of p. 


A function f: X — Y is J-/ (“one-to-one”) or injective when f(x) = f(y) > 
xX = y; itis onto or surjective when f X = Y. A bijection is a function which is both 
1-1 and onto; every bijection has an inverse function f~!, whereby f—~!o0 f(x) = x, 


feof '@)=y. 
Sets may be finite, countably infinite, or uncountable, depending on whether there 
exists a bijection from the set to, respectively, (i) a set {1,...,} for some natural 


number n, or (ii) N, or (111) otherwise. In simple terms, a set is countable when its 
elements can be listed, and finite when the list terminates. If A, B are countable sets 
then sois A x B; more generally, the union of the countable sets Ay,n = 0,1, 2,..., 
is again countable: 


Ao = { AR Ee ge } 
LO of eS 

Aj={ ayo, At," 12, wot} 
Ya a 

Az={ a2, 421, a2, ... } 
a 
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(oe) 
U An = { 400, 401, 410, 402, ---} 
n=0 


A relation is a statement about pairs of elements taken from X x Y,e.g.x = y?+1 
for (x, y) € R?. An equivalence relation © on a set X is one which is 


reflexive xX, 
symmetric xy & yx, 
transitive x ~y*¥z> xz 


An equivalence relation induces a partition of the set X into equivalence classes 
faJ:={xEX:xwra}. 

An order < is a relation which is reflexive, transitive and anti-symmetric x < 
y<x => x = y. One writes x < y when x < y but x ¥ y. A linear order is 
one which also satisfies x < y OR y < x. A number x is “positive” when x > 0, 
whereas “strictly positive” means x > 0. An “upper bound” of a set A is a number b 
which is larger than any a € A. A “least upper bound”, denoted sup A, is the smallest 
such upper bound (if it exists), i.e., every upper bound of A is greater than or equal to 
sup A. There are analogous definitions of lower bounds and greatest lower bounds, 
denoted inf A. 


A group is a set G with an associative operation and an identity element 1, such 
that each element x € G has an inverse element aoe, 


xX(yz)=(xy)z, Ix =x=x1, xx =1= xox. 


A subgroup is a subset of G which is itself a group with the same operation and 
identity. A normal subgroup is a subgroup H such that x~!Hx C H for all x € G. 
An example of a group is the set C\{ 0} with the operation of multiplication; the set 
S:= {el :0eR}isa subgroup since eel? = eiOF0) | = ei0 (ei9)—1 — ei? 
are allin S. 

A field F is a set of numbers, such as Q, R, or C, whose elements can be added 
and multiplied together associatively, commutatively, and distributively, 


Va,b,c EF, (a+b)+c=a+(b+0), (ab)c = a(be), 
at+tb=b+a, ab = ba, 
(a+ b)c=ac+ be, 
there is a zero 0 and an identity 1, every element a has an additive inverse, or negative, 


—a, and every a £ 0 has a multiplicative inverse, or reciprocal, |/a. 


O+a=a, la=a 
a+(-a)=0, at=1(a#0). 


The real number space R is that unique field which has a linear order < such that 


(a) a>bs>a+t+clb+4+c,anda,b>0 > ab=0, 
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(b) Every non-empty subset with an upper bound has a least upper bound. 


The intervals are the subsets 


bl:={xeER:ia<x<b}, Ja, bl:={x Ee R:a<x <b}, 
a,bl:={xeER:a<x <b}, Ja,b[:={x eR:a<x <b}, 
[:={xeR:ia<x}, Ja,ool := {x €R:a<x} 
,aj:={xeR:x<a}, ] ,at:={x ER: x <a}, 


where a < b are fixed real numbers. The real numbers satisfy the Archimedean 
property 


Vx >O,dneN, n>x. 


The proof is simple: If the set N had an upper bound in R then it would have a least 
upper bound a; by definition, this implies that a — 1 is not an upper bound, meaning 
there is a number 1 € N such thatn > a — 1; yetn + 1 < a. This contradiction 
shows that no x € R is an upper bound of N: there is ann € N such thatn > x. 


There is an important set principle that is not usually covered in elementary 
mathematics textbooks: 


The Axiom of Choice: If A = { Aq : a € J} is acollection of non-empty subsets 
of a set X (the index a ranges over some set /), then there is a function f: [ > X 
such that f(a) € Aq. 

That is, this ‘choice’ function selects an element from each of the sets Ag. The 
Axiom of Choice is often used to create a sequence (x,) from a given list of non- 
empty sets A,, with x, € A,. It seems obvious that if a set is non-empty then an 
element of it can be selected, but the existence of such a procedure cannot be proved 
from the other standard set axioms. 


Part I 
Metric Spaces 


Chapter 2 
Distance 


Metric spaces can be thought of as very basic spaces, with only a few axioms, where 
the ideas of convergence and continuity exist. The fundamental ingredient that is 
needed to make these concepts rigorous is that of a distance, also called a metric, 
which is a measure of how close elements are to each other. 


Definition 2.1 


A distance (or metric) on a metric space X is a function 


d: X?>5Rt 
(x, y) + d(x, y) 


such that the following properties (called axioms) hold for all x, y, z € X, 
(i) d(x, y) < d(x, z) + dz, y) (Triangle Inequality), 


»*% 


(ii) d(y,x) = d(x, y), (Symmetry) a 
(iii) d(x, y)=0 @x=y. a 


A metric space is not just a set, in which the elements have no relation to each 
other, but a set X equipped with a particular structure, its distance function d. One 
can emphasize this by denoting the metric space by the pair (X, d), although it is 
more convenient to denote different metric spaces by different symbols such as X, Y, 
etc. 

In what follows, X will denote an abstract set with a distance, not necessarily R 
or RY, although these are of the most immediate interest. We still call its elements 
“points”, whether they are in reality geometric points, sequences, or functions. What 
matters, as far as metric spaces are concerned, is not the internal structure of its 
points, but their outward relation to other points. 
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Maurice Fréchet (1878-1973) studied under Hadamard (who 
had proved the prime number theorem and had succeeded 
Poincaré) and Borel at the University of Paris (Ecole Normale 
Supérieure); his 1906 thesis developed “abstract analysis”, an 
axiomatic approach to abstract functions that allows the Eu- 
clidean concepts of convergence and distance, as well as the 
usual algebraic operations, to be applied to functions. Many 
terms, such as metric space, completeness, compactness etc., 
are due to him. 


Fig. 2.1 Fréchet 


Although most distance functions treated in this book are of the type d(x, y) = 
|x — y|, as for R, the point of studying metric spaces in more generality is not only 
that there are some exceptions that don’t fit this type, but also to emphasize that 
addition/subtraction is not essential, as well as to prepare the groundwork for even 
more general spaces, called topological spaces, in which pure convergence is studied 
without reference to distances (but which are not covered in this book). 

There are two additional axioms satisfied by some metric spaces that merit par- 
ticular attention: complete metrics, which guarantee that their Cauchy sequences 
converge, and separable metric spaces whose elements can be handled by approx- 
imations. Both properties are possessed by compact metric spaces, which is what 
is often meant when the term “finite” is applied in a geometric sense. These are 
considered in later sections. 


Easy Consequences 
1. d(x, z) 2 |d(x, y)—d(, y)I. 


2. If x1,...,X, are points in X, then by induction on n, 
(x1, Xn) < d(x, X2) + +++ + d(xn-1, Xn). 


Examples 2.2 


1. Thespaces N, Z, Q, R, and C have the standard distance d(a, b) := |a — b|. Check 
that the three axioms for a distance are satisfied, making use of the in/equalities 
ls+t| < |s| + |¢], |—s| = |s|, and|s] =0 © s=0. 


2. » The vector spaces R™ and CN have the standard Euclidean distance defined by 


d(x, y) i= a ean la; — bj |? for x = (a,...,an), y = (b1,..., bn) (prove 


this for N = 2). 


3. One can define distances on other more general spaces, e.g. we will later show 
that the space of real continuous functions f with domain [0, 1] has a distance 


defined by d(f, g) := maxxejo,1) |f(@) — g@)I. 


4. <The space of ‘shapes’ in R* (roughly speaking, subsets that have an area) have 
a metric d(A, B) defined as the area of (A U B)\(AN B). 
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5. » Any subset of a metric space is itself a metric space (with the ‘inherited’ or 
‘induced’ distance). (The three axioms are such that they remain valid for points 
in a subset of a metric space.) 


6. » The product of two metric spaces, X x Y, can be given several distances, none 
of which have a natural preference. Two of them are the following 


Dy ((51), (33) := dx x1, x2) + dy(y1, y2), 
Doo (31), (3) == max (dx (x1, x2), dy (1, y2))- 


For convenience, we choose Dj as our standard metric for X x Y, except for RY 
and C, for which we take the Euclidean one. 


Proof for D,: Positivity of D; and axiom (ii) are obvious. To prove axiom (iii), 


D,(x1, X2) = 0 implies dx (x1, x2) = 0 = dy(y1, y2), 80 x1 = X2, y1 = yo, and 
x1 = (31) = (3) = x2. As for the triangle inequality, 


Dj(%1,¥2) = dx (x1, x2) + dy(y1, y2) 
< dy (x1, x3) + dx (x3, x2) + dy (1, y3) + dy (y3, y2) 
= D,(x1,x3) + Di (x3, X2). 


Exercises 2.3 
1. Show that if d(x, z) > d(z, y) thenx # y. 
2. Write in mathematical language, 


(a) The subsets A, B are close to within 2 distance units; 


(b) A and B are arbitrarily close. 


3. The set of bytes, i.e., sequences of Os and Is (bits) of length 8 (or any length), 
has a “Hamming distance” defined as the number of bits where two bytes differ; 
e.g. the Hamming distance between 10010111 and 11001101 is 4. 


4. Any non-empty set can be given a distance function. The simplest is the discrete 


I xy 
0 x=y 


on the same set (except when there is only one point!); for example, if d is a 
distance function then so are 2d and d/(1 + d). 

(* Not every function of d will do though! The function d? is not generally a 
metric; what properties does f: imd — R* need to have in order that f od 


also be a metric?) 


metric d(x, y) := . Indeed, there are infinitely many other metrics 


5. A set may have several distances defined on it, but each has to be considered as 
a different metric space. For example, the set of positive natural numbers has a 
distance defined by d(m, n) := |1/m—1/n| (prove!); the metric space associated 
with it has very different properties from N with the standard Euclidean distance. 
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For example, in this space one can find distinct natural numbers that are arbitrarily 
close to each other. 


6. Letn = +2*3"... be the prime decomposition of any n € Z and define |n|2 := 
1/2*, |O|2 := 0. Show that | - |2 satisfies the same properties as the standard 
absolute value and hence that d(m, n) := |m — n|2 is a distance on Z (called the 
2-adic metric). 


7. * Given the distances between n points in RY, can their positions be recovered? 
Can their relative positions be recovered? 


2.1 Balls and Open Sets 


The distance function provides an idea of the “surroundings” of a point. Given a 
point a and a number r > 0, we can distinguish between those points ‘near’ to it, 
satisfying d(x, a) <r, and those that are not. 


Definition 2.4 


An (open) ball, with center a and radius r > 0, is the set 
B-(a) :={x € X : d(x,a) <r}. 


Despite the name, we should lay aside any preconception we may have of it being 
“round” or symmetric. We are now ready for our first, simple, proposition: 


Proposition 2.5 


Distinct points of a metric space can be separated by disjoint balls, 


Bey ss ae=0 LOA) =e. 


Proof If x € y then d(x, y) > 0 by axiom (iii). Letting r := d(x, y)/2, then B,(x) 
is disjoint from B,(y) else we get a contradiction, 


z€ B(x) NB-(y) > d(x,z) < rANDd(y,z) <r 
= d(x, y) < d(x, z) + d(y, 2) 
< 2r=d(x, y). oO 
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Examples 2.6 
1. In R, every ball is an open interval 
By(a) ={x ER: |x-—al <r}= Ja—r,at+rf{. 


Conversely, any open interval of the type Ja,b[ is a ball in R, namely 


Byp—aij2(F2). 


2. In R?, the ball B,(a) is the disk with center a and radius r without the circular 
perimeter. 


3. InZ, Byj2(m) = {ne Z: |n—m| < 5} = {m}and By(m) = {m—1,m,m+1}. 


4. It is clear that balls differ depending on the context of the metric space; thus 
By /2(0) = ]—4, 5[ in R, but By/2(0) = {0} in Z. 


Open Sets 


We can use balls to explore the relation between a point x and a given set A. As the 
radius of the ball B,(x) is increased, one is certain to include some points which 
are in A and some points which are not, unless A = X or A = @. So it is more 
interesting to investigate what can happen when the radius is small. There are three 
possibilities as r is decreased: either B; (x) contains (i) only points of A, or (ii) only 
points in its complement A®, or (iii) points of both A and A®, no matter how small 
we take r. 


Definition 2.7 


A point x of a set A is called an interior point of A when it can be “surrounded 
completely” by points of A, i.e., 


dar >0, B(x) CA. 


In this case, A is also said to be a neighborhood of x. 
A point x (not in A) is an exterior point of A when 


dr>0, B(x) C XNA. 


All other points are called boundary points of A. 

Accordingly, the set X is partitioned into three parts: its interior A°, its 
exterior (A)°, and its boundary 0A. The set of interior and boundary points of 
A is called the closure of A and denoted by A := A° UOA. 

A set A is open in X when all its points are interior points of it, i.e., A = A° 
(Fig. 2.2). 
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A small enough ball around an interior point 


Fig. 2.2. The distinction between interior, boundary, and exterior points 


Examples 2.8 


1. In R, the intervals Ja, b[, [a, b[, Ja, b], and [a, b] have the same interior Ja, b[, 
exterior, and boundary { a, b}; their closure is Ja, b[ = [a, D]. 


Proof For any a < x < b,letO < € < min(x —a,b— x), thena <x-—€ < 
x +e <b, thatis Be(x) C Ja, b[; this makes x an interior point of the interval. 


For x < a, there is ane < a — x such that x € B.(x) C ]—oo, a[ C R\[a, D]. 
Similarly, any x > b is an exterior point of the interval. 


For x = a, any small interval B.(a) contains points such as a + €/2, that are 
inside B,(a), and points outside it, such as a — €/2, making a (and similarly b) 
a boundary point. 


2. » The following sets are open in any metric space X: 


(a) 


(b) 


(c) 


X~{ x } for any point x. The reason is that any other point y # x is separated 
from x by disjoint balls (our first proposition); this makes y an interior point 
of X\{x}. 

The empty set is open by default, because it does not contain any point. The 
whole space X is also open because B,(x) C X for any r > O and x € X. 


Balls are open sets in any metric space. 


Proof Let x € B,(a) be any point in the 
given ball, meaning d(x, a) < r. Lete := 
r—d(x,a) > 0;then B.(x) C B;(a) since 
for any y € B,(x), 


tA Be(x) 


d(y,a) <d(y,x)+d(x,a) <€+d(x,a) =r. B,(a) 


3. » The least upper bound of a set A in R is a boundary point of it. 


Proof Let a be the least upper bound of A. For any € > 0, a + €/2 is an upper 
bound of A but does not belong to it (else a would not be an upper bound). 
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Even if aw ¢ A, then the interval Ja — €/2, a[ cannot be devoid of elements of 
A, otherwise a would not be the /east upper bound. So the neighborhood B, (a) 
contains elements of both A and A°. 


Proposition 2.9 


The set of interior points A° is the largest open set inside A. 


Proof If B C A then the interior points of B are 
obviously interior points of A, so B° C A®. In 
particular every open subset of A lies inside A° 
(because B = B°), and every (open) ball in A 
lies in A°. This implies that if B,(x) C A then 
B,(x) © A°%, so that every interior point of A 
is surrounded by other interior points, and A° is 
open. Oo 


Proposition 2.10 


A set A is open < A is the union of balls. 


Proof Let A be an open set. Then every point of it is interior, and can be covered by 
a ball B,(,)(x) © A. Taking the union of all the points of A gives 


A=Utxi¢oU Ba@® CA, 


xeA xeEA 


forcing A = aver Bx) (x), a union of balls. 

Now let A := U, B,,(a;) be a union of balls, and let x be any point in A. 
Then x is in at least one of these balls, say, B,(a). But balls are open and hence 
x € B(x) C B,(a) C A. Therefore A consists of interior points and so is open. O 


The early years of research in metric spaces have shown that most of the basic 
theorems about metric spaces can be deduced from the following characteristic prop- 
erties of open sets: 


Theorem 2.11 


Any union of open sets is open. 
The finite intersection of open sets is open. 
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Proof (i) Consider the union of open sets, J); Ai. Any x € LJ; Ai must lie in at least 
one of the open sets, say A ;. Therefore, 


x € B(x) AVC Ai 
i 


shows that it must be an interior point of the union. 
(ii) It is enough, using induction (show!), to con- B 


sider the intersection of two open sets AM B. Let A 
x € AM B, meaning x € A and x e€ B, with 
both sets being open. Therefore there are open balls 

B,,(x) © A and B,,(x) C B. The smaller of these 

two balls, with radius r := min(r , 72), must lie in 


ANB, 


x € B(x) = By (x) B(x) CAN B. 


Examples 2.12 
1. > The exterior (A)° = (A°)° of a subset A is open in X. 
2. A° = ANOA. Soa set is open < it does not contain any boundary points. 


3. Let Y C X inherit X’s distance. Then A is open in Y if, and only if, A= UN Y 
for some subset U open in X. 


Proof Care must be taken to distinguish balls in Y from those in X: BY (x) = 
BX (x) 1 Y. If A is open in Y, then by Proposition 2.10, 


A= a 2 a= U BX @ NY =UNY. 
acA acA 


For the converse, interior points of U C X which happen to be in Y are interior 
points of A as a subset of Y, 


ye BX(y)CU = ye BA(y)NY CUNY=A. 


Limit Points 


It may happen that a point a of a set A is surrounded by points not in A, that is, there 
is a ball B;(a) which contains no points of A other than a itself. We call such points 
isolated points. The property that a point cannot be isolated from the rest of A is 
captured by the following definition: 
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Definition 2.13 


A point b (not necessarily in A) is a limit point of a set A when every ball 
around it contains other points of A, 


Ve >0, da fb, ac AN B.(b). 


Thus every point of A is either a limit point or an isolated point of A. 


Exercises 2.14 


Li 


10. 


In R, the set {a} has no interior points, a single boundary point a, and all other 
points are exterior. It is not an open set in R. There are ever smaller open sets 
that contain a, but there is no smallest one. 


. mR {Ii/n:neN}={l1/n:nEN}U{O}. 


. The set Q, and also its complement, the set of irrational numbers Q°, do not 


have interior (or exterior) points in R. Every real number is a boundary point of 
Q. 


Similarly every complex number is a boundary point of Q + iQ. 


. The set {m} in Z does not have any boundary points; it is an open set in Z 


(Bi j2(m) = {m}). 


> Notice that whether a point is in the interior (or boundary, or exterior) of a set 
depends on the metric space under consideration. For example, { m } is open in N 
but not open in R; the interval Ja, b[ is open in R, but not open when considered 
as a subset of the x-axis in R*. We thus need to specify that a set A is open in X. 


. Describe the interior, boundary and exterior of the sets 


{(@x,y) eR: |[xl+lyl <1}, (Gy) © R?: 4 < max((x], lyl) < 1}. 


. Of the proper intervals in R, only Ja, D[, Ja, co[, and ]—oo, a[ are open. 


. In R?’, the half-plane { (x, y) € R* : y > 0} and the rectangles Ja, b[ x Jc, d[ := 


{ (x,y) € R?:a <x <b,c < y <d} are open sets. 


. » A® has the same boundary as A; its interior is the exterior of A, that is, 


(A)o = (AS)? (and A = A°®°); 50 8A = ANAS. 


. Find an open subset of R, apart from R itself, without an exterior. 


So, the exterior of the exterior of A need not be the interior of A. Similarly, the 
boundary of A or A° need not equal the boundary of A. 


> An infinite intersection of open sets need not be open. For example, in R, the 
open intervals ]—1/n, 1/n[ are nested one inside another. Their intersection is 
the non-open set {0} (prove!). Find another example in R?. 
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11. 


12. 


13. 


14. 
15. 


16. 


17. 
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Deduce from the theorem that if every { x } is open in X, then every subset of X 
is open in X. This ‘extreme’ property is satisfied by N, and also by any discrete 
metric space. 


Any point x with d(x, a) > r is in the exterior of the open ball B,(a). But the 
boundary of B,(a) need not be the set { x : d(x, a) = r }. Find a counterexample 
in Z. 


* Every open set in R is a countable disjoint union of open intervals. (Hint: An 
open set in R is the disjoint union of open intervals; take a rational interior point 
for each.) 

In contrast to this simple case, the open sets in R2, say, can be much more 
complicated—there is no simple characterization of them, apart from the defin- 
ition. 

Can a set not have limit points? Can an infinite set not have limit points? 


In R, the set of integers Z has no limit points, but all real numbers are limit points 
of Q. 

(a) 1 is an interior isolated point of { 1, 2} in Z; 

(b) 1 is a boundary isolated point of { 1, 2} in R; 

(c) 1 is an interior limit point of [0, 2] in R; 

(d) 1 is a boundary limit point of [0, 1] in R. 


In R and Q, an isolated point of a subset must be a boundary point, or, equiva- 
lently, an interior point is a limit point. 


2.2 Closed Sets 


Definition 2.15 


A set F is closed in a space X when X~ F is open in X. 


Proposition 2.16 


A set F is closed & F contains its boundary & F = F. 


Proof We have already seen that the boundary of a set F and of its complement F° 
are the same (because the interior of F° is the exterior of F). So F is closed, and 
F° open, precisely when this common boundary does not belong to F°, but belongs 
instead to FS = F. o 
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Examples 2.17 


1. In R, the set [a, b] is closed, since R\[a, b] = ]—co, a[ U JD, o[ is the union of 
two open sets, hence itself open. Similarly [a, co[ and ]—oo, a] are closed in R. 


2. N and Z are closed in R, but Q is not. 


3. p> In any metric space X, the following sets are closed in X (by inspecting their 
complements): 


(a) the singleton sets { x }, 

(b) the ‘closed balls’ B,[a] := {x € X : d(x,a) <r}, 

(c) X and 2, 

(d) the boundary of any set (the complement of 0A is A° U (A°)°). 


4. » The complement of an open set is closed. More generally, if U is an open set 
and F aclosed set in X, then US F is open and FU is closed. The reasons are 
that UNF =UN F® and (FNU)° = FEUU. 


Closed sets are complements of open ones, and their properties reflect this: 


Proposition 2.18 


The finite union of closed sets is closed. 
Any intersection of closed sets is closed. 


Proof These are the complementary results for open sets (Theorem 2.11). For F, G 
closed sets in X, F°, G° are open, so the result follows from 


c 
(FUG =Fene’, (A) =UFt. 
i i 
and the definition that the complement of a closed set is open. Oo 


Theorem 2.19 Kuratowski’s closure ‘operator’ 


A is the smallest closed set containing A, called the closure of A. 


AEB SS AS IB: A= A AUB=AUB. 


Proof The complement of A is the exterior of A, which is an open set, so A is closed. 
This implies A = A Proposition 2.16. 
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If A C B, then an exterior point of B is obviously an exterior point of A, that 
is (By° Cc (A)°; so A C B. It follows that if F is any closed set that contains A, 
then A G F = F, and this shows that A is the smallest closed set containing A. 
(Alternatively, Proposition 2.9 can be used: how?) 

Of course, A C AUB follows from A C AU B; combined with B CAUB, it 
gives AUB C AUB. Moreover, AU B is aclosed set which contains A U B, and 
so must contain its closure A U B. oO 


Exercises 2.20 


1. It is easy to find sets in R which are neither open nor closed (so contain only 
part of their boundary). Can you find any that are both open and closed? 
The terms “open” and “closed” are misnomers, but they have stuck in the liter- 
ature, being derived from the earlier use of “open/closed intervals”. 


. The set {x € Q: x? < 2} is closed, and open, in Q. 
. In any metric space, a finite collection of points {a ,,..., ay } is a closed set. 


. The following sets are closed in R: [0, 1] U {5}, Up ole: n+ 5]. 


nan FF WwW WY 


. The infinite union of closed sets may, but need not, be closed. For example, the 
set UP , { 1/n} is not closed in R; which boundary point is not contained in it? 


6. Find two disjoint closed sets (in R* or Q, say) that are arbitrarily close to each 
other. 


7. Start with the closed interval [0, 1]; remove the 


open middle interval 1; Fl to get two closed 
intervals [0, 1 U [. 1]. Remove the middle — 
interval of each of these intervals to obtain four 
closed intervals [0, 5]UL§, $JUL¥, g]UL8, 1. 
If we continue this process indefinitely we end 
up with the Cantor set. Show it is a closed set. 


= = — 
= 


an 


8. Denote the decimal expansion of any number in [0,1] by 0.1jn2n3.... Show 
that 


{x €[0,1]:x=O.nnon3... > ae <5 Vk} 


is closed in R. 


9. » One can define the “distance” between a point and a subset of a metric space 
by d(x, A) := infge4 d(x, a). Then x € A exactly when d(x, A) = 0. 


10. Let x be an exterior point of A, and let y € A have the least distance between x 
and A. Do you think that y is unique? or that it must be on the boundary of A? 
Prove or disprove. For starters, take the metric space to be R?. 
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11. Show equality need not hold in AN B C ANB. Indeed two disjoint sets may 
‘touch’ at a common boundary point. 

12. Show the complementary results of the theorem: A°7 B° = (ANB)°, A°° = A®. 

13. IfAC B, does it follow that A° C B? 


Dense Subsets 


We often need to approximate an element x € X to within some small distance € 
by an element from some special subset A C X. The elements of A may be simpler 
to describe, or more practical to work with, or may have nicer theoretical qualities. 
For example, computers cannot handle arbitrary real numbers and must approximate 
them by rational ones; polynomials are easier to work with than general continuous 
functions. The property that elements of a set A can be used to approximate elements 
of X to within any €, namely, 


Vx € X, Ve > 0, da ec A, d(x, a) <e, 


is equivalent to saying that any ball B, (x) contains elements of A, in other words A 
has no exterior points. 


Definition 2.21 


A set A is dense in X when A = X (so A contains all balls). 
A set A is nowhere dense in X when A contains no balls. 


Exercises 2.22 


1. » Qis dense in R. (This is equivalent to the Archimedean property of IR.) More 
generally, a set A is dense in R when for any two distinct real numbers x < y, 
there is an element a € A between them x <a <y. 


2. The intersection of two open dense sets is again open and dense. 


3. A finite union of straight lines in R* is nowhere dense. Z and the Cantor set are 
nowhere dense in R. 


4. Nowhere dense sets have no interior points. 


5. A is nowhere dense in X <& XNA is dense in X <= A is the boundary of an 
open set. 


6. * What are the nowhere dense sets in IR? (Hint: Exercise 2.14(13)) 
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Remarks 2.23 


1. If d(x, y) = 0 does not guarantee x = y, but d satisfies the other two axioms, 
then it is called a pseudo-distance. In this case, let us say that points x and y 
are indistinguishable when d(x, y) = 0(< Vz, d(x, z) = d(y, z)). This is 
an equivalence relation, which induces a partition of the space into equivalence 
classes [x]. The function D([x], [y]) := d(x, y) is then a legitimate well-defined 
metric. 


In a similar vein, if d satisfies the triangle inequality, but is not symmetric, then 
D(x, y) := d(x, y)+d(y, x) is symmetric and still satisfies the triangle inequal- 
ity. 

Positivity of d follows from axioms (i) and (ii), d(x, y) > |d(x, z) —dQy, z)| = 0. 


2. The axioms for a distance can be re-phrased as axioms for balls: 


(a) Bo) = 2, (),39 Br@) = {x}, Uso Br) = X, 
(b) {y:x € B,(y)} = B,(), 
(c) BsoB-(x) C B,+5(x), iLe., if y © Bs(z) where z € B,(x) then y € B;+5(x). 


3. The concept of open sets is more basic than that of distance. One can give a set X 
a collection of open sets satisfying the properties listed in Theorem 2.11 (taken 
as axioms), and study them without any reference to distances. It is then called a 
topological space; most theorems about metric spaces have generalizations that 
hold for topological spaces. There are some important topological spaces that are 
not metric spaces, e.g. the arbitrary product of metric spaces []; X;, and spaces 
of functions XY := {f:Y—> X}. 


Chapter 3 
Convergence and Continuity 


3.1 Convergence 


The previous chapter was primarily intended to expand our vocabulary of 
mathematical terms in order to better describe and clarify the concepts that we will 
need. Our first task is to define convergence. 


Definition 3.1 


A sequence (x,,) in a metric space X converges to a limit x, written 


Xn —> x asn —> oo, when tQ * aa . “0 
Ve>0, IN, n>N => Xn € Ba). . — 
5 ows 


A sequence which does not converge is said to diverge. 


One may express this as “any neighborhood of x contains all the sequence from 
some point onwards,” or “eventually, the sequence points get arbitrarily close to the 
limit”. 


Proposition 3.2 


In a metric space, a sequence (x,,) can only converge to one limit, denoted 


lim Xp. 
n—->>Oo 


Proof Suppose x, — x and x, > y asn — o, with x # y. Then they can be 
separated by two disjoint balls B,(x) and B;(y) (Proposition 2.5). But convergence 
means 
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Felix Hausdorff (1868-1942) studied atmospheric refraction in 
Bessel’s school at Leipzig in 1891. In 1914, at 46 years of age 
in the University of Bonn, he published his major work on set 
theory, with chapters on partially ordered sets, measure spaces, 
topology and metric spaces, where he built upon Fréchet’s ab- 
stract spaces, using open sets and neighborhoods. Later, in 
1919, he introduced fractional dimensions. But in the late 1930s, 
increasing Nazi persecution made life impossible for him. 


Fig. 3.1 Hausdorff 


AN; n>N => Xn € B,(x), 
AN2 n>Nzr => Xn € By). 


For n > max(Nj, N2) this would result in x, € B-(x)N Bey) = Sa 
contradiction. oO 
Examples 3.3 


1. In any metric space, x,» > x < d(%,x) — Oasn — o (because 
Xn € Be(x) <> d(xn,x) < €). For example, x, — x when d(x,,x) < 1/n 
holds. 


2. InR,n/(n+1) > lasn > o, since for any ¢, there is an N such that 1/N < « 
(Archimedean property of R), so 


1 1 
< 
n+1 N 


n 
n+1 


n>N = |I- <€. 


| 2 
3. Given two convergent real sequences a, — aandb, — b,thena,+b, > a+b. 
Proof For any € > 0, there are Nj, N2, such that 
n>N > |q—-al <é, n>No => |b, —b| <e. 
Thus for n > max(N,, N2), 
(Qn + bn) — (a+ b)| < lan — a| + [bn — B| < 2e. 


4. pm A sequence ie ) in X x Y converges to (3) if, and only if, x, — x and y, > y. 


Proof Any distance in Example 2.2(6) can be used, but we will use the standard 
metric here. The distance between (}"") and (3) is 


5 := d((3"), G)) =d@n, x) +dOn, y) > 0, asn > ov. 


As both d(x, x) and d(y,, y) are less than 6, the converse follows. 
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5. Consider a composition of functions N > N — X where the first function is 1-1, 
and the second is a sequence. A subsequence is the case when the first function is 
strictly increasing, and a rearrangement is the case when it is 1-1 and onto. For 
example, 1, 1/4, 1/9, ... is a subsequence of (1/n), and 1/2, 1, 1/4, 1/3,...isa 
rearrangement. Any such ‘sub-selection’ of aconvergent sequence also converges, 
to the same limit. 


Proof Supposen > N = d(x, x) < €. Let (%»;) be a sub-selection of (xz). 
As nj < N can only be true of a finite number of indices i, with the largest, say, 
M, it follows that 


i>M>nj>N => d(xn,,x) <€. 


6. A sequence converges fast (or ‘linearly’) when d(xn,x) < Ac” for some real 
constants A > 0,0 < c¢ < 1. Quadratic convergence, d(x, x) < Ac?", is even 
faster. Instead 1/n and </2 converge slowly. 


Limits and Closed Sets 


There are many questions in analysis of the type: If x, has a property A, and x, > x, 
does x still have this property? For example, if a convergent sequence of vectors in 
the plane lies on a circle, will its limit also lie on the same circle? Or, can continuous 
functions (or differentiable, or integrable, etc.) converge to a discontinuous function? 
The following proposition answers this question in a general setting: the ‘property’ 
A needs to be closed in the metric space. 


Proposition 3.4 


Ifx, € A and x, > x, thenx € A. 


Conversely, in a metric space, for any x € A there is a sequence x, € A 
which converges to x. 


In particular, closed sets are “closed” under the process of taking the limit. 


Proof Take any ball B,(x) about x. If x, converges to x, then all the sequence points 

will be in the ball for n large enough. Since x, € A, x cannot be an exterior point, 

and so lim x, =x € A. Of course, when A is closed, A = A (Proposition 2.16). 
noo 


For the converse, let By/,(x) be a decreasing sequence of nested balls around 
x € A; whether x is a boundary or interior point of A, By/,(x) contains at least a 
point a, in A (which could be x itself). So d(a,,x) < 1/n ~ Oasn — ov, and 
An —> X. oO 
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Exercises 3.5 
1. pe InR, 
(a) 1/n — 0 (this is a rewording of the Archimedean property of the real num- 
bers: for every x > 0, there is ann € N such thatn > x). 


(b) a” > Owhen0 <a < 1, butdiverges fora > 1. (Hint: When 1 < a = 1+6, 
then a” = 1+né6+--- > 76; otherwise consider 1 /a.) 


(c) n/a" > 0 whena > 1, hence n/a" = (n/b")* > 0. 


(d) */a > 1 for any a > O, and it 4 (so (ogn)/n — 0). (Assuming 
a > 1, expand a!/” =: 1 + a, using the binomial theorem to show that 
an < a/n — O; similarly show we < 2/(n — 1) for the second sequence.) 


(e) * (1+ 1/n)” converges to a number denoted e. This is too hard to show for 
the moment. Show at least that the sequence is increasing but bounded by 3, 
using the binomial theorem. (This highlights the need of “convergence tests”: 
how can one know that a sequence converges when the limit is unknown?) 


(f) Wn! — co (what should this mean?) 
2. What do the sequences 2+./2 + ./2+---and1 eis “T_ converge to, assuming 
they do? 


3. In R, if a, — 0 then a” — 0; find examples where (i) a, — 0 but ay; 
(ii) dn > 1 butal A 1. 
4. » If a, < by, for two convergent real sequences then lim a, < lim b, (Hint: 
n—>oo noo 


[0, oo[ is closed). In particular, if a, converges and a, <a, then lim dy, <a. 
n—->oo 


5. Squeezing principle: In R, if ay < xX, < by and lim a, =a = lim by, then x, 
N—- Oo n—->oo 
converges (to a). 


6. It is possible for a divergent sequence to have a convergent subsequence. Find 
one in the sequence (1, —1, 1, —1, ...). But any rearrangement must diverge. 


7. » We may occasionally encounter ‘sequences’ with two indices (dn,) (they are 
more properly called nets). The example n/(n + m) shows that in general 


lim lim ann # im lim. dmn- 
mon —>om 


The same example shows that, in R, generally, sup, inf, Gym 4 inf SUP, Anm- 
But the following are true: 


(a) SUP; SUP, Gnm = SUP SUP, Anm, 
(b) sup, (Gn + bn) < sup, dn + sup, Dn. 
8. Ifx, — x ina metric space X, and x, A x for all n, then x is a limit point of the 


set {X1,X2,x3,...}. Butif x, is eventually constant (n > N => x,y, = x), then 
X, — x without x being a limit point of { x, }. 
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Note that given a real sequence (x,,), even one that does not converge, the largest 
limit point of { x, } is denoted lim sup x, and the smallest lim inf x, (if they exist). 


3.2 Continuity 


One is often not particularly interested in the actual values of the distances between 
points: no new theorems will result by substituting metres with feet. What matters 
more, in most cases, is convergence. Accordingly, functions that preserve conver- 
gence (rather than distance) take on a central importance. 


Definition 3.6 


A function f : X — Y between metric spaces is continuous when it preserves 
convergence, 
X,2>xinX > fm) — f(x) in. 


In this case therefore, f (impo Xn) = liMyn-+o0 f (Xn). Before we see any exam- 
ples, let us prove that the following three statements are equivalent formulations of 
continuity in metric spaces, so any of them can be taken as the definition of continuity. 


Theorem 3.7 


A function f: X — Y between metric spaces is (i) continuous 
& (ii) Vx € X, Ve > 0, Ad > O, 
dx(x,x') <5 => dy(f(x), f(x’) <e, 


< (iii) For every open set Vin Y, f~!V is openin X. 


The second statement is often written as lim,_,, f(x’) = f(x) for all x. 
Proof (i) = (11) Suppose statement (ii) is false; then there is a point x € X and an 
€ > O such that arbitrarily small changes to x can lead to sudden variations in f(x), 


VS >0, Ax’, dy(x,x') <6 AND dy(f(x), fQ@’)) Fe. 


In particular, letting 5 = 1/n, there is a sequence! x, € X satisfying dy (x, Xn) < 
1/n but dy(f (x), f(%n)) = €. This means that x, — x, but f(x) & f(x), 
contradicting statement (i). 


' This selection of points x, needs the Axiom of Choice for justification. 
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(ii) = (iii) Note that (11) can be rewritten as 
Vx EX, Ve>0, 35>0, x’ € B(x) > f(x’) © Be(f(x)) 


or even as 
vx eX, Ve>0, 35>0, fBs(x) C Be(f(x)). 


Let V be an open set in Y. To show that U := f~!V is open in X, let x be any point 
of U; then f(x) € V, which is open. Hence 


f(x) € Be(f(x)) SV, 


and so 
Ad >0, fBs(x) © Be(f(x)) © V. 


In other words, x is an interior point of U, 
J5>0, B(x) C f 'V =U. 


xX Y 


(iii) => (i) Let (x,) be a sequence converging to x. Consider any open neighborhood 
Be(f (x)) of f(x). Then | ia B.(f (x)) contains x, and is an open set by (iii), so 


35>0, x € Bs(x) C f-'Be(f(x)), 
= 456>0, fBs(x) C Be(f(x)). 


But eventually all the points x, are inside Bs(x), 


AN>0, n>N => xy © Bs(x) 
=> fn) € fBs(x) © Be(f(x)) 
=> dy(f (Xn), f(x)) <€. 


This shows that f(x,) > f(x) asn > o. oO 
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Examples 3.8 
1. The square root function on [0, oo[ is continuous. 


Proof Let x,€ > 0, and 8 := €./x (for x = 0, choose 6 = €*), then 


r) € 
Ix -—yl <6 > |/x-VJyl < < <€, 
Vx+/¥ ~ 14+ JSy/x 


2. Let X, Y, Z be metric spaces, then the function h: X — Y x Z defined by 
h(x) := (f(x), g(x)) is continuous if, and only if, f, g are continuous. For 
example, the circle path 6 +> (cos 6, sin) is a continuous map R > R?. 


Proof The statement follows directly from Example 3.3(4) 


i) ree Ga <= f(xn) > f(x) AND g(%) > g(x). 


3. » If f: X > Y is continuous, then fA C fA. So if A is dense in X, then fA 
is dense in fX. 


Proof If x € A, then there is a sequence of elements of A that converge to x, 
Xn > x (Proposition 3.4). By continuity of f, fn) > f(x), so f(x) € fA. 
It follows that if A = X then fX C fAN fX. 


The following three propositions affirm that continuity is well-behaved with 
respect to composition and products, and that the distance function is continuous. 
They allow us to build up continuous functions from simpler ones. 


Proposition 3.9 


If f: X — Y and g: Y —> Z are continuous, so is go f: X — Z. 


Proof Let x, — x in X. Then by continuity of f, f(a) — f(x) in Y, and by 
continuity of g, 


9° fXn) = GFOn)) > WF) = G0 f(x) in Z. 


Alternatively, let W be any open set in Z. Then g~! W is an open set in Y, and so 
f—'g7!W is an open set in X. But this set is precisely (go f)~!W. Oo 


Proposition 3.10 


The distance function d: X2 > R is continuous. 
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Proof Let x, — x and y, — yin X. Then, by the triangle inequality, 


|d(Xn, Yn) — d(x, y)| |d(Xn, Yn) — A(x, Yn)| + |d(x, Yn) — d(x, y)| 


d(Xn, X) + d(yn, y) > 0, 


IN. IX 


which gives d(xn, yn) > d(x, y) asn > oo. 


Homeomorphisms 


Continuous functions preserve convergence, a central concept in metric spaces; in 
this sense, they correspond to homomorphisms of groups and rings, which preserve 
the group and ring operations. The analogue of an isomorphism is called a homeo- 
morphism: 


Definition 3.11 


A homeomorphism between metric spaces X and Y isa mapping J: X > Y 


such that 
J is bijective (1-1 and onto), 


J is continuous, 
J~! is continuous. 


A metric space X is said to be embedded in another space Y, when there is a 
subset Z C Y such that X is homeomorphic to Z. 


Like all other isomorphisms, “X is homeomorphic 
to Y” is an equivalence relation on metric spaces. 
When X and Y are homeomorphic, they are not only 
the same as sets (the bijection part) but also with 
respect to convergence: 

Xn > x & Jn) > J), 


and 
A is openin X < JA is open in Y. 


The elements of Y are those of X in different clothing, as far as convergence is 
concerned. The most vivid picture is that of “deforming” one space continuously 
and reversibly from the other. The by-now classic example is that a ‘teacup’ is 
homeomorphic to a ‘doughnut’. 
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Exercises 3.12 


1. 


10. 


11. 


12. 


Any constant function f: x +» yo € Y is continuous. The identity function 
I: X — X,x + x, is always continuous. 


The functions that map the real number x to x + 1, 2x,x"(n € N), a* (a > 0), 
and |x| are all continuous. 


In R, addition and multiplication are continuous, 1.e., if x, — x andy, — y then 
Xn + Yn > x+y and x,y, > xy. Deduce that if f, g: X — R are continuous 
functions, then so are f + g and fg. For example, the polynomials on R are 
continuous. The function max: R? — Ris also continuous, ie., max(Xn, Yn) > 
max(x, y). 


The function f: ]0, co[ — ]0, ov[, defined by f(x) := 1/x is continuous. 
Conjugation in C, z +> Z, is continuous. 


lxeA 
Ox ZA 
except when A = @ or A = R. Is this true for all metric spaces? 


In R, the characteristic function 1,4 (x) = is always discontinuous 


When f: X — Ris acontinuous function, the set {x € X : f(x) > 0} 1s open 
in X. 


Any function f: N — N is continuous. 


The graph of a continuous function f: X — Y, namely { (x, f(x)): x € X}, 
is closed in X x Y (with the D,; metric). 


Find examples of continuous functions f (in X = Y = R) such that 


(a) f is invertible but f—! is not continuous. 
(b) f(xn) > f(x) in Y but (x,) does not converge at all. 


(c) U is open in X but fU is not open in Y. However functions which map 
open sets to open sets do exist (find one) and are called open mappings. 


If F is aclosed set in Y and f: X — Y is acontinuous function, then f~! F 
is closed in X. But f may map a closed set to a non-closed set (even if f is an 
open mapping!). 


It is not enough that f(x, y) is continuous in x and y separately in order that f 
be continuous. For example, show that the function 


(x,y) = = (0,0) :=0 
PED aay TOO” 


is discontinuous at (0, 0) even though f(x,,0) > 0, f(0, yn) = 0, when x, > 
0, yn — 0. It needs to be “jointly continuous” in the sense that f (2%, yn) > 
f(x, y) for any (Xn, Yn) > (, y). 
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13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 
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The function f(x) := p(./x) on the domain R*, where p is a polynomial, is 
continuous. 


The roots of a quadratic equation ax” + bx + c = 0 vary continuously as the 
coefficients change (but maintaining b? > 4ac), except at a = 0. 


> Use the continuity of d to find a short proof that the sphere S,:={y:d(x, y)=r} 
is closed. 


Given a set A C X, the map x +» d(x, A) is continuous. (Hint: d(y, A) < 
d(y, x) + d(x, A).) 


Given disjoint non-empty closed subsets A, B C X, find a continuous function 
f: X — [0, 1] such that fA = 0, f B = 1 (Hint: use d(x, A) and d(x, B)). 


Every interval in R is homeomorphic to [0, 1], [0, co[, or R. 


N is homeomorphic to the discrete metric space on a countable set, but Q is 
not. (Hint: The convergent sequence |/n — O must correspond to a divergent 
sequence in N.) 


~ A bent line in the plane, consisting of two straight line segments meeting at 
their ends, is homeomorphic to the unbent line. Thus angles are meaningless 
as far as homeomorphisms are concerned; triangles, squares and circles are 
homeomorphic. 


Chapter 4 
Completeness and Separability 


4.1 Completeness 


Our task of rigorously defining convergence in a general space has been achieved, 
but there seems to be something circular about it, because convergence is defined 
in terms of a limit. For example, take a convergent sequence x, — x in a metric 
space X, and “artificially” remove the point x to form X\x (assume Wn, x, # x). 
The other points x, still form a sequence in this subspace, but it no longer converges 
(otherwise it would have converged to two points in X)—its limit is “missing”. The 
sequence (x,,) is convergent in X but divergent in X \.x. How are we to know whether 
a metric space has “missing” points? And if it has, is it possible to create them when 
the bigger space X is unknown? 

To be more concrete, let us take a look at the rational numbers: consider the 
sequences (1, 2,3,...), (1, —1,1,—1,...), and (1, 1.5, 1.417, 1.414, 1.414, ...), 
the last one defined iteratively by ap := 1, dn41 = = + x. It is easy to show 
that the first two do not converge, but, contrary to appearances, neither does the 
third, the reason being that were it to converge to a € Q, thena = a/2+ 1/a, 
implying a* = 2, which we know cannot be satisfied by any rational number. This 
sequence seems a good candidate of one which converges to a “missing” number not 
found in Q. Having found one missing point, there are an infinite number of them: 
(2, 2.5, 2.417, 2.414, ...) and (2, 3, 2.834, 2.828, .. .) cannot converge in Q. 

But could it be that the first two sequences also converge to “missing” numbers? 
How are we to distinguish between sequences that “truly” diverge from those that 
converge to “missing” points? There is a property that characterizes intrinsic conver- 
gence: suppose that (x,,) is divergent in the metric space Y, but converges x, — a 
in a bigger space X. Then the points get close to each other (in Y), 


dy (Xn, Xm) = dx (Xn, Xm) < dx (Xn, a) + dy (a, Xm) > 0,asn,m — oo. 
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Definition 4.1 


A Cauchy sequence is one such that d(xy, Xm) — 0 as n,m — oo, that is, 


Ve >0, AN, nsm>N => d(Xn, Xm) < €. 


To clarify this idea further, we prove: 


Proposition 4.2 


Two sequences (x,,), (y,) are defined to be asymptotic when d(xn, yn) > 0 
asn — oo. 


(i) Being asymptotic is an equivalence relation. 
(ii) For (x,) asymptotic to (y,), 


(a) if (x,) is Cauchy then so is (y,), 
(b) if (x,) converges to x then so does (y,). 


(iii) A sequence (x,,) is Cauchy if, and only if, every subsequence of (x,) 
is asymptotic to (x,). 


Proof (i) Let (m1) ~ Qn) signify d(%n, yn) > O as n — oo. Reflexivity and 
symmetry of ~ are obvious. If (x,) ~ (yn) ~ (Zn) then transitivity holds: 

d(Xn, Zn) < d(Xn, Yn) +dn, Zn) > O asn > ow. 
(ii) If d(x, yn) > O and d(%, Xm) > 0 as n,m — oo, then 

A(Yns Ym) < dyn, Xn) + d(Xn, Xm) + dm, Yn) > 0. 


Similarly, if d(x,, x) > 0 then d(yp, x) < d(¥n. Xn) +d (Xn, x) > 0. 


(iii) A Cauchy sequence satisfies 
Ve >0, AN, n,m >N => d(xXn, Xm) < €. 


Given a subsequence (x, ), its indices satisfy n; > i (by induction oni: n; 2 1, 
nz >n, > 1son2 > 2, etc.). Thus 


i2zaN>n,12N > d(Xn;,Xi) < € 


and d(xn,,x;) > Oasi > ov. 
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Conversely, suppose (x,,) is not Cauchy. Then 
de > 0, Vi, Anj, mj 2 i, (Xn; + Xm,;) 2 €, 


from which we can create the subsequences (Xn,,%n,,.-.) and (%m,,Xm,,--.). If 
both these subsequences were asymptotic to (x,,) then there would exist an N such 
that i > N implies d(x;, Xn;) < €/2 as well as d(x;, Xm,;) < €/2. Combining these 
two then gives a contradiction 


(Xn; Xm;) < d(x, Xn;) + d(xj, Xm;) <€, 


so one of the two subsequences is not asymptotic to (x,). Oo 


Examples 4.3 


1. Convergent sequences are always Cauchy, since if x, — x then d(x, xm) > 
d(x,x) = 0 by continuity of the distance function. But the discussion above 
gives examples of Cauchy sequences which do not converge. 


2. In R or Q, any increasing sequence that is bounded above, ay, < b, is Cauchy. 


Proof Split the interval [ao, b] into subintervals of length e. Let J be the last 
subinterval which contains a point, say ay. As the sequence is increasing, J con- 
tains all of the sequence from N onward, proving the statement. 


3. Rand Qhave the bisection property: Let Lag, bo] 
be an interval in R or Q, and divide itinto halves, @  ».——___} bo 


[ao, c] and [c, bo], where c := (ap + bo)/2 a, -—j fy 
is the midpoint. Choose [a;, b;] to be either a2 KH bo 
[ao, c] or[c, bo], randomly or according to some az H bs 
criterion; continue taking midpoints to get a H 
nested sequence of intervals [a,, by], whose H 
lengths are 


bn — dyn = (bo — ag) /2” — 0. 
So, for any € > 0, there is an N > O such that by — ay < €,and for anyn > N, 
an, by € [an, bn ]. Hence (a,) and (by) are asymptotic Cauchy sequences. 


4. Let B,,, be anested sequence of balls (B,,,,, © B,,,), withr, — 0. Then choosing 
any points x, € B,,, gives a Cauchy sequence. 


Proof For any m > n, 


Xm € By, CB C---CB,, 


tm >= ~Tm-1 = = 


so that d(%m, Xn) < 2r, > Oasn,m —> oo. 
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5. A Cauchy sequence cannot stray too far in the sense that d(xo, Xn) < R for all 
n, for some R > 0. Hence Cauchy sequences are “bounded”. 
Proof By the definition of a Cauchy sequence for € := | say, there is an N such 
thatn,m >N => d(xn, Xm) < €. Therefore 


d(x0, Xn) < d(xo, xy) + d(xn, Xn) < d(x0, xn) + €. 


6. A Cauchy sequence in Q either converges to 0, or is eventually greater than some 
€ > Oor less than some —e < 0. In each case, an asymptotic sequence behaves 
in the same manner. 


Proof If a, 7 0 yet is Cauchy, then 


de > 0, VM, Im >M, |am| > €, 
AN, m,n>N => |ady — am| < €/2. 


Assuming, for example, a, > € for some m > N, 
n2>N > ay 2 Am — |\An — Gm| > €/2. 


If (bn) is an asymptotic sequence, there is an M such that |a, — by| < €/2 
whenever n > M, and so 


n>max(N,M) => ba 2 an — |Gn — bn| > €/2. 


Complete Metric Spaces 


Definition 4.4 


A metric space is complete when every Cauchy sequence in it converges. 


In a complete metric space, there are no “missing” points and any divergent 
sequence is “truly” divergent—there is no bigger metric space which makes it con- 
vergent. 

It follows that the space of rational numbers Q (with the standard metric) is not 
complete, a fact that allegedly deeply troubled Pythagoras and his followers. They 
shouldn’t have worried because there is a way of creating the missing numbers (but 
skip the proof if it worries you on a first reading!): 


Theorem 4.5 


The real number space R is complete. 
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Proof (i) For this to be a theorem, we need to be clear about what constitutes R. The 
usual definition is that it is a set with an addition + and multiplication - which satisfy 
the axioms of a field (see p. 9), and with a linear order relation < that is compatible 
with these operations: 


x>y>xt+z22yt+z, *x*,ys05 xy ZO, 


and in addition satisfies the completeness axiom: 
Every non-empty subset A of IR with an upper bound has a least upper bound. 


Assuming all these axioms, let (a,,) be a Cauchy sequence in R, that is, for any € > 0, 
there is an N beyond which |ay, — am| < €. Let 


B:={xeR:4IM,n>M > x <a}. 


Its elements might be called eventual lower bounds of {a, : n € N}. The fact that 
Cauchy sequences are bounded implies that {a, : n € N} has a lower bound and so 
B # ©, while any upper bound of {a, : n € N} is also one of B. Hence, by the 
completeness axiom, B has a least upper bound a. Two facts follow, 


(a) a + € is not an element of B, so there must be an infinite number of terms 


an, <aA+€; 

(b) @ — € is not an upper bound of B, so there must exist an x € B and an M such 
thatn >M > a-—-€<x <Q. 

These facts together imply that forn; > M we have a —€ < a,, <a +e. Then 


n> max(M,N) => |dn — | < |an — An;| + lan, — &| < 2€ 


as required to show dy, —> @. 


This proof is open to the criticism that we have not proved whether, in fact, there 
exists such a set with all these properties. We need to fill this logical gap by giving 
a construction of R that satisfies these axioms. 


(11) The whole idea is to treat the Cauchy sequences of rational numbers themselves 
as the missing numbers! How can a sequence be a number? Actually, this is not 
really that novel—the familiar decimal representation of a real number is a particular 
Cauchy sequence: e := 2.71828... is just short for (2, 2.7, 2.71, 2.718, .. .). There is 
of course nothing special about the decimal system—the binary expansion (2, 25 ,2+ 
5+ 7 , ...), along with several other Cauchy sequences, also converges to e. We should 
be grouping these asymptotic Cauchy sequences together, and treat each class as one 
real number. For example, the asymptotic sequences 0.32999... and 0.33000... 
represent the same real number. 

Accordingly, R is defined as the set of equivalence classes of asymptotic Cauchy 
sequences of rational numbers; each real number is here written as x = [dy] (instead 
of the cumbersome [(a,)]). We now develop the structure of R: addition and multi- 
plication, its order and distance function. Define 
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x+y = [dn] + [Pn] = [an + 4p], XY = Lan] LOn] = [anbal. 


That addition is well-defined follows from an application of the triangle inequality in 
Q; that it has the associative and commutative properties follows from the analogous 
properties for addition of rational numbers. The new real zero is [0, 0, .. .], and the 
negatives are —x = —[a,] = [—a,]. Similarly, multiplication is well defined and 
has all the properties that make R a field. 

It is less straightforward to define an inequality relation on R. Let (a,) > 0 mean 
that the Cauchy sequence (a,,) is eventually strictly positive Example 4.3(6), 


Jee Q*, IN, n>N Sa, >e>0O. 


Any other asymptotic Cauchy sequence must also eventually be strictly positive. 
Correspondingly, let x < y mean that y — x > 0, or equivalently, 


[an] < [bn] + Fe € Q*, AN, Vn > N, an te <n. 
This immediately shows thatx < y © x+z < y+ z. We make a few more 
observations about this relation: 
1. if a, > 0 for all n, then [a,] > 0, 
2. if0 < x andO < y then 0 < xy and0O < x + y (gives transitivity of <), 
3.x >0 OR x =O OR x < 0 (Example 4.3(6)). 
4. ifx <0 then —x > 0. 


Anti-symmetry of < follows from the fact that (b, — a,) cannot eventually be both 
strictly positive and strictly negative. This makes R a linearly ordered field. 


Given a real number x = [ay] = [by], let |x| := []an|], which makes sense since 
Ilan _ laml| < |dn — Gn| > Oasn,m—> ~, 
lan] — [nl] < lan — bn| > Oasn > ov. 


In fact |x| = x when x > 0 and |x| = —x when x < 0, so it satisfies the properties 
|x| 2 0, |x] =0 <> x =0,|—x| = |x|, and |x + y| < |x| + y|. Thus d(x, y) = 
|x — y| is a distance, as in Example 2.2(1). 


Qis dense in R: Note that a rational number a can be represented in R by the con- 
stant sequence [a, a, ...]. The Archimedean property holds since [a,] > 0 implies 
that eventually a, > p > 0, 4p € Q, so [an] > [p/2] > 0. Also, if x = [ay] then 
dn — x inR, since for any € > 0, let p € Q, 0 < p < €,s0 


AN, nsm>N > |aQn —am| < P 
=> d(an,x) =d (lan, ay,...], [a1, a2, ...]) 
= [lan — 44], lan — a2|,---] 
a Dy 
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The completeness axiom is satisfied: Let A be any non-empty subset of R that is 
bounded above. Split R into the set B of upper bounds of A, and its complement B®, 
both of which are non-empty, say ag € B®, bo € B; these can even be taken to be 
rational, by the Archimedean property. 


Divide [ao, bo] in two using the midpoint c := (ap + bo)/2; if c € B then select 
[a,, b;] = [ao, c], otherwise take [a,, b;] = [c, bo]. Continue dividing and selecting 
sub-intervals like this, to get two asymptotic Cauchy sequences (a,), (b,), with 
bn € B, dn € B®. Let a := [an], 80 dn > a, by > a, and (Exercise 3.5(4)) 


VaeA,a<bh, > VaeA,a , @is an upperbound of A, 
< 


Whe B,a,<b > VbeEB,a 


IN. IN 


a 
b, ais the least upperbound. 


A dual argument shows that every non-empty set with a lower bound has a greatest 
lower bound, denoted inf A. 


R is complete: This now follows from part (i), but we can see this directly in this 
context. Start with any Cauchy sequence of real numbers (in decimal form, say) and 
replace each number by a rational number to an increasing number of significant 
places, for example: 


mM ER Pa, €Q 


2.6280... 2 
2.7087... 2.7 
QIMTB ok 2.71 
2.7181... 2.718 


The crucial point is that the two sequences are asymptotic by construction. Since the 
first one is Cauchy, so must be the second one. But a Cauchy sequence of rational 
numbers is, by definition, a real number x. Moreover, a, — x implies x, > x. O 


This “completion” process generalizes readily to any metric space. 


Theorem 4.6 


Every metric space X can be completed, that is, there is a complete met- 
ric space X, containing (a dense copy of) X and extending its distance 
function. 


Any such complete metric space X is called the completion of X. 
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Proof Construction of X: Let C be the set of Cauchy sequences of X. For any 
two Cauchy sequences a = (x,), b = (yn), the real sequence d(xn, yn) is also 
Cauchy (Exercise 4.10(6)), and since R is complete, it converges to a real number 
D(a, b) := limp-oo d(Xn, yn). Symmetry and the triangle inequality of D follow 
from that of d, by taking the limit n — oo in the following: 


d(Yns Xn) = d(Xn, Yn) D(b, a) = D(a, b) 
d(Xns Yn) < d(Xn, Zn) + d(Zn, Yn) D(a, b) < D(a,c) + Dee, b). 


The only problem is that D(a, b) = 0, meaning d(xn, yn) — 0, is perfectly possi- 
ble without a = b. It happens when the Cauchy sequences (x,), (y,) are asymptotic. 
We have already seen that this is an equivalence relation, so C partitions into equiv- 
alence classes. Write d ({a], [b]) := D(a, b); it is well-defined since for any other 
representative sequences a’ € [a] and b’ € [b], we have 


Dia’, b') < D(a’, a) + D(a, b) + D(b, b’) = Dea, b); 


similarly D(a, b) < D(a’, b’); so D(a, b) = Dca’,b’). Let X be this space of 
equivalence classes of Cauchy sequences, with the metric d. 


There is a dense copy of X in X: For any x € X, there corresponds the constant 
sequence x := (x, x,...) in C. Since 


d([x], [yl = D(x), (y)) = jim dx, y) =d(x, y), 


this set of constant sequences is a true copy of X, preserving distances between points. 
To show that this copy is dense in X, we need to show that any representative Cauchy 
sequence a = (x,,) inC has constant sequences arbitrarily close to it. By the definition 
of Cauchy sequences, foranye > 0,thereisan N € Nwithd(x,,xy) < €forn > N. 
Let x be the constant sequence (xy). Then D(a, x) = limy—so0 d(Xn, XN) K€ < 2€ 
proves that [x] is within 2¢ of [a]. 


X is complete: Let ([a,]) be a Cauchy sequence in Xx; this means d({an], [am]) = 
D(ayn,An) — 0, as n,m — ov. For each n, we can find a constant sequence x, 
which is as close to ay, as needed, i.e., D(Xn, An) < €n; by choosing €, — 0, we 
can select (x,) to be asymptotic to (a,). AS (a,) is Cauchy, so is (x,). In fact, 
Xn —> X := (Xp) since 


lim D(xn,x) = lim d(Xqj,X%m) = 9, 
n—-> oo m,n oo 


so that the asymptotic sequence a, also converges to x, and [a,,] to [x]. Oo 


Proving that a given metric space is complete is normally quite hard: Even showing 
that a particular Cauchy sequence converges may not be an easy matter because one 
has to identify which point it converges to, let alone doing this for arbitrary Cauchy 
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sequences. But once a space is shown to be complete, one need not go through the 
same proof process to show that a subspace or a product is complete: 


Proposition 4.7 


Let X, Y be complete metric spaces. Then, 
(i) A subset F C X is complete <= F is closed in X, 


(ii) X x Y is complete. 


Proof (i) Let F © X be complete, i.e., any Cauchy sequence in F converges to a 
limit in F. Let x € F, witha sequence xX, — xX, X, € F (Proposition 3.4). Since 
convergent sequences are Cauchy and F is complete, x must be in F. Thus F = F 
is closed. The completeness of X has not been used, so in fact a complete subspace 
of any metric space is closed. 

Conversely, let F be a closed set in X and let (x,) be a Cauchy sequence in F’. 
Then (x,) is a Cauchy sequence in X, which is complete. Therefore x, — x for 
some x € X. In fact, x € F = F. Thus any Cauchy sequence of F converges in F. 


(11) Let ) be a Cauchy sequence in X x Y. Recall that 
n 
Xn Xm — 
a(( ) ; ( )) = dx (Xn, Xm) + dy (Yn, Ym) 2 dx (Xn, Xm). 
Yn Ym 


Since the left-hand sequence converges to 0 asin, m — oo, we get dx (%1,Xm) > 0, 
so that the sequence (x,) is Cauchy in the complete space X. It therefore con- 
verges X, — x € X. By similar reasoning, y, — y e€ Y. Consequently, 


d ((*") ; (;)) = dy(xn, x) + dy(vn, y) > 0 as n > of, which is equiva- 


Yn y 
lent to (*” ) > (5 )inx xy, oO 
Yn y 
Examples 4.8 


1. The completion of a subset A in a complete metric space X is A. 


Proof The completion Y of A must satisfy two criteria: Y must be complete, and 
A must be dense in Y. Now, A is closed in X, so is complete, and A is dense in 
A (by definition). 


2. Two metric spaces may be homeomorphic yet one space be complete and the 
other not. For example, R is homeomorphic to JO, 1[ (Exercise 3.12(18)), but the 
latter is not closed in R. 
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3. Let f: X — Y beacontinuous function. /f it can be extended to the completions 
as a continuous function f: X — Y, then this extension is unique. 


Proof Any x € X hasa sequence (a,) in X converging to it (Proposition 3.4). 
AS fi is continuous, we find that f (x) is uniquely determined by 


f(x) = lim fan) = lim, f Gn). 


4. But not every continuous function f: X — Y can be extended continuously to 
the completions f : X — Y. For example, the continuous function f(x) := 1/x 
on ]0, oo[ cannot be extended continuously to [0, ov[. 


5. (Cantor) The completion of Q to R has come at a price: R is not countable. Prove 
this by taking the binary expansion of a list of real numbers in [0, 1], arranged in 
an infinite array, and creating a new number from the diagonal that is different 
from all of them. The next theorem is a strong generalization of this statement. 


Theorem 4.9 Baire’s category theorem 


A complete metric space cannot be covered by a countable number of 
nowhere-dense sets. 


Proof Suppose that the metric space X = UP2, An, where Ap are nowhere dense. 
We are going to create a nested sequence of balls whose centers form a non- 
convergent Cauchy sequence, as follows: To start with, Aj # X so its exterior 
contains a ball B,,(x1) © (A1)°. Now Ad contains no balls, so the open set 
(A2)° 0 B,, (x1) is non-empty and there is a ball B,, (x2) C (A2)°N B,, (x1). 


Continuing like this, we can find a sequence of 
points (using the Axiom of Choice) 


Xn+1 


Xn41 © Bry 4 (Xn41) S (An+1)° O By, (Xn). 


Moreover at each stage, r,, can be chosen small 
enough that 


Mm —>O0 (e.g.%m < 1/n), 


Bry [Xn+1] = B,,, (Xn) (€-8-Tn41 < tn — d(Xn, Xn+1)). 


Thus (x;,) is a Cauchy sequence (Example 4.3(4)). 

Now suppose that x, — x. For all m > n we have Xm € By,,,(%n+1) and taking 
the limit x, — x we find x € B,,,,;[%n+1] © B,, (%). Since this holds for any n we 
obtain 
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r.. 
f René-Louis Baire (1874-1932), after graduating from Paris 
eS , around 1894, tackled the problem of convergence and limits of 
| functions, namely that no space of functions then known was 


“closed” under pointwise convergence. Progress on this issue 
was made by his colleague Borel in the direction of measurable 
sets. 


Fig. 4.1 Baire 


x €() Brn) S ( \(An)® = (U An) Cc (U An) =X=g 


a contradiction. Having constructed a non-convergent Cauchy sequence, X must be 
incomplete. oO 


Exercises 4.10 
1. Any sequence in Q of the type (3.1, 3.14, 3.141, 3.1415, ...) is Cauchy. 
2. The sequences (1, 2,3,...) and (1, —1, 1, —1,...) are not Cauchy. 
3. * Try to prove that the sequence defined by ag := 1, ay41 := - + - is Cauchy. 
4. If a sequence (x,), chosen from a finite set of points, e.g. (x, y, x, xX, y,...), 18 


Cauchy then it must eventually repeat (xo, ...,*y,*xN,.-.). (Hint: Generalize 
Exercise 2.) 


5. & If d(%n41,%n) < ac” with c < 1 then x, is Cauchy. But a sequence which 
decreases at the rate d(Xn+41,Xn) < 1/n need not be Cauchy. For example, 
use the principle of induction to show that the example in Exercise 3 satisfies 
ldn41 —4n| < Gyr. 

The following give sufficient conditions for Cauchy sequences: 


(a) If d(%n41,%n) < cd(Xn, Xn-1) with c < 1, then d(x, Xm) < ac”, 
(b) If d(xn41,Xn) < cd(Xn, Xn—1)" with cd(x1,x0) < 1 then d(x%j,,xm) < 
col p?”, 
for n < m and appropriate constants a, b. 
. If (Xn), (yn) are Cauchy sequences in X, then so is dy := d(Xn, yn) in R. 


. > Acontinuous function need not map Cauchy sequences to Cauchy sequences. 


. Ifx, > x and y, > x, then (x), (yn) are asymptotic. 


Oo Oo NY DW 


. s/n and /n + 1 are asymptotic divergent sequences in R. 


10. » A subsequence of a Cauchy sequence is itself Cauchy, and if it converges so 
does its parent sequence. 
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11. 


12. 


13. 
14. 
15. 


16. 


17. 


18. 
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If (x) is a Cauchy sequence, and the set of values {x,} has a limit point x, then 
Xn > Xx. 


The completion of ]0, 1[ and of [0, I[ is [0, 1]. Any Cauchy sequence in the Can- 
tor set C must converge in C. However a Cauchy sequence of rational numbers 
need not converge to a rational number because Q is not closed in R. 


> RY :=Rx --- x Rand C are complete. 
Is N complete? Any discrete metric space is complete. 


(Cantor) We have already seen that the centers of a nested sequence of balls with 
’n — O form a Cauchy sequence (Example 4.3(4)). Show, furthermore, that in 
a complete metric space, On B,, [Xn] = {limn—oo Xn}. 


The only functions f: Q — Qsatisfying f(x +y) = f(x)+ f(y) are fi x be 
Ax. Deduce that the only continuous functions f: R — R with this property 
are of the same type. 


* The completion of X is essentially unique, in the sense that any two such 
completions (such as the one defined in the theorem) are homeomorphic to each 
other. 


The Cantor set is complete and nowhere dense in R; why doesn’t this contradict 
Baire’s theorem? 


4.2 Uniformly Continuous Maps 


We have seen that a continuous function need not preserve completeness, or even 
Cauchy sequences. If one analyzes the root of the problem, one finds that its resolution 
lies in the following strengthening of continuity: 


Definition 4.11 


A function f: X — Y is said to be uniformly continuous when 


Ve >0, 46 >0, Vx © X, fBs(x) C Be(f(x)). 


The difference from continuity is that, here, 5 is independent of x. 


Easy Consequences 


1. 
2. 


Uniformly continuous functions are continuous. 


But not every continuous map is uniformly so; an example is f(x) := 1/x on 
10, cof. 


4.2 Uniformly Continuous Maps 49 


3. » The composition of uniformly continuous maps is again uniformly continuous. 
Proof Ve > 0, 35,8’ > 0, Wx, g( f(Bs(*))) S (Bs (f(@))) & Be(g(f (x))). 


The key properties of uniformly continuous maps are the following two proposi- 
tions: 


Proposition 4.12 


A uniformly continuous function maps any Cauchy sequence to a Cauchy 
sequence. 


Proof By definition f: X — Y is uniformly continuous when 
Ve > 0, 38> 0, Vx, x’, dy(x,x".) <6 = dy(f(x), ff’) <e. 
In particular, for a Cauchy sequence (x,,) in X, with this 6, 


AN, nysm>N => dx(Xn, Xm) < 6 
=> dy(f (Xn), fm) < €, 


proving that (f(x,)) is a Cauchy sequence in Y. Oo 


More generally, the same proof shows that a function f: X — Y is uniformly 
continuous if, and only if, it maps any asymptotic sequences (a,), (b,) in X to 
asymptotic sequences (f (a,)), (f(6,)) in Y. 


Theorem 4.13 


Every uniformly continuous function f: X — Y has a unique uniformly 
continuous extension to the completions f: X — Y. 


Proof In order not to complicate matters unnecessarily, let us suppose that X and 
Y are dense subsets of X and Y respectively, instead of being embedded in them. 
Nothing is lost this way, except quite a few extra symbols! 

Let x, > x € x. with x, € X. The sequence f(x,) is Cauchy in Y by the 
previous proposition, so must converge to some element y € Y. Furthermore, if 
an —> x as well (a, € X), then (x,) and (a,) are asymptotic (Example 4.10(8)) 
forcing f(xn) and f (an) to be asymptotic in Y, hence f(a,) — y. This allows us 
to define f(x) := y without ambiguity. Moreover, this choice is imperative and fi is 
unique, if it is to be continuous. 
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The uniform continuity of f follows from that of f. For any € > 0, there is a 
6 > 0 for which 


Va,be X, d(a,b) <6 => d(f(a), f(b)) <€. 


Let x, x’ € X with d(x, x’) < 6, leta, > x, by > x’ with ay, by € X and, by the 
above, f (an) > f(x), f(n) > f (x). Among these terms, we can find a close to 
x and b close to y to within r := (6 — d(x, x'))/2 < 6, while also f(a) is close to 
f (x) and f(b) is close to f(x’) to within €. Then 


d(a,b 
=> d(f(x), fx) 


d(a,x) + d(x, x’) + d(x’, b) < 2r + d(x, x’) =5 


) 
) <d(f(x), f@) +4(fO, fb) + 4(FO), FO") < 3¢. 


ey 2 


The following are easily shown to be uniformly continuous functions: 


Definition 4.14 


A function f : X — Y is called a Lipschitz map when 
de> 0, Wx,x' eX, dy(f(x), f(%’)) <cdx(, x’). 


Furthermore, it is called 


an equivalence (or bi-Lipschitz) when f is bijective and both f and f~! are 
Lipschitz, 


a contraction when it is Lipschitz with constant c < 1, 


an isometry, and X, Y are said to be isometric, when f preserves distances, 
4B 


Vous GX dy (*), (@ ))=dx@.. ): 


Examples 4.15 


1. Any f: [a, b] — R with continuous derivative is Lipschitz. 


Proof As f’ is continuous, it is bounded on [a, b], say | f’(x)| < c. The result 
then follows from the mean value theorem, 


f@)-f@.=f@O@-+x), 3 €)0, 1. 


2. To show f: R* > R? is Lipschitz, where f = (f;, f2), it is enough to show 
that 
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[fiGas yi) — fC, 2) < e( a1 — 22| + Lyi — al), 2=— 1,2 
for then (using (a + b)? < 2(a? + b) for a, b € R) 


] @ (x1, y1) — fil%2, y2) 


foi, v1) — for, ef | < | fi@1. 1) — fi@a, v2) + |r, 1) — fora, y2)I 


2c(|x1 — x2] + ly — yal) 


2eV2| (128) 


Y1—Y2 


< 
< 


3. » Lipschitz maps are uniformly continuous, since for any € > 0, we can let 
5 := €/2c independent of x toobtaind(x, x’) < 6 > d(f (x), f(x')) < cb <e. 


4. But not every uniformly continuous function is Lipschitz. For example, ./x on 
[0, 1] is uniformly continuous (show!); were it also Lipschitz, it would satisfy 
|./x — /0| < clx — 0| which leads to /x > 1/c. 


The next theorem is one of the important unifying principles of mathematics. It 


has applications in such disparate fields as differential equations, numerical analysis, 
and fractals. 


Theorem 4.16 The Banach fixed point theorem 


Let X + ©@ be a complete metric space. Then every contraction map 
f: X — X has a unique fixed point x = f(x), and the iteration x4) := 
f (Xn) converges to it for any xo. 
Proof Consider the iteration x,+1 := f(x») starting with any xo in X. Note that 
d(Xn41; Xn) = d(f (Xn), f &n-1)) <cd(Xn, Xn-1)- 
Hence, by induction on n, 
d(Xn41, Xn) < c"d(x1, x0), 


SO (Xn) is Cauchy since c < 1 (Exercise 4.10(5)). As X is complete, x, converges 
to, say, x, and by continuity of f, 


f(x) = f(lim x,) = lim f(x,) = lim x41 =x. 
n—-oo n—-oo n—-> oo 


Moreover, the rate of convergence is given at least by d(x, xn) < fod (x1, Xo). 
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4 Completeness and Separability 


Suppose there are two fixed points x = f(x) 


and y = f(y); then 


d(x, y) =d(f(x), f(y) < cd, y) 


implying d(x, y) = Osincec < 1. 


Exercises 4.17 


1. 


10. 


Show that 
(a) f: [a,b] > R, f(x) := x + 1/x, is a contraction when a > 27-2; 


(b) f: [0,1]? > R2, f(x,y) i= & oa -) is Lipschitz. 


x? — 


. The composition of two Lipschitz maps is Lipschitz. 
. » A Lipschitz map (with constant c) sends the ball B,.(a) into the ball B,,(f (a)). 


. Isometries are necessarily 1-1. Onto isometric maps are equivalences, and the 


latter are homeomorphisms. 


. > Two metric spaces are said to be equivalent when there is an equivalence 


map between them. Equivalent metric spaces must be both complete or both 
incomplete. 


. » If a space has two distances, the inequality d)(x, y) < cd2(x, y), where 


c > 0, states that the identity map is Lipschitz. In the same vein, two distances 
are equivalent when there are c,c’ > 0 such that c’d)(x, y) < di(x,y) < 
cd>(x, y). Show that two equivalent distances have exactly the same Cauchy 
sequences. 


. The unit circle has two natural distance functions, (i) the arclength @ and (ii) the 


Euclidean distance 2 sin(@/2), where @ is the angle between two points (<7). 
Prove that the two are equivalent by first showing 


20/m < sind <4, 0<é60<z7/2. 


. The distances D; and Doo for X x Y (Example 2.2(6)) are equivalent. 


. The fixed point theorem can be generalized to the case when f: B,[xo] > X 


is a contraction map, as long as the starting point satisfies d(xo, x1) < (—c)r. 
Use the triangle inequality to show that x, remain in B,[xo]. 


The classic example of an iteration converging to a fixed point is that provided 
by a continuously differentiable function f: R > R with | f’(x)| < 1; itisa 
contraction map in a neighborhood of x. 
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11. If f: R > Ris acontraction with Lipschitz constant c < 1, then f(x) = x can 
also be solved by iterating x,41 := F(x,) where F(x) := x —a(x — f(x)), 
0 <a < 2/(c + 1). Hence find an approximate solution of x = sinx + | near 
to x = 7; experiment by choosing different values of ~ and compare with the 
iteration X41 := f (Xn). 


4.3 Separable Spaces 


Completeness is a “nice” property that a metric can have. A different type of property 
of a metric space is whether it is, in a sense, “computable” or “constructive”. Starting 
from the simplest, and speaking non-technically, we find: 


Finite metric spaces There are a finite number of possible distances to 
compute. 
Countable metric spaces With an infinite number of points, an algorithm may 


still calculate distances precisely, but it may take 
longer and longer to do so. 

Separable metric spaces Points can be approximated by one of a countable 
number of points; any distance can be evaluated, not 
precisely, but to any accuracy. 

Non-separable metric spaces There may be no algorithm that finds the distance 
between two generic points, even approximately. 


Non-separable metric spaces are, in a sense, too large, while countable metric 
spaces leave out most spaces of interest. 


Definition 4.18 


A metric space is separable when it contains a countable dense subset, 


dA C X, Acountable AND A=X. 


Examples 4.19 
1. Countable metric spaces, such as N, Z, Q, are obviously separable. 


2. » R is separable because the countable subset Q is dense in it. By the next 
proposition, C and RY are also separable.! 


' There is a catch here: The metric used in the proposition is not the Euclidean one. But 


the inequalities used there remain valid for the Euclidean metric, </ dx (dn, X)? + dy (bn, y)2 < 


Je2/44+ 62/4 <e. 
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Proposition 4.20 


Any subset of a separable metric space is separable. 
The product of two separable spaces is separable. 
The image of a separable space under a continuous map is separable. 


Proof (i) Let Y C X and A = X, with A = {ay : n € N} countable. For each 
Gn, let Ynm = {y € Y : d(an, y) < 1/m}, and pick a representative point from 
each, Ynm © Yn,m, Whenever the set is non-empty. This array of points is certainly 
countable, and we now show that it is dense in Y. 

Fix 0 < € < 33 any y € Y can be approximated by some a, € A with 
d(ay, y) <€. Pick the smallest integer m such that m > 1/2€; thenm — 1 < 1/2e, 
som < 1/e; therefore e < 1/m < 2€. Then y € Y; ™ 4 Z, so that there must be a 
representative yym With d(ady, Yn,m) < 1/m < 2e€. Combining the two inequalities, 
we get 

d(Yn,ms Y) < d(yn,m; Qn) + d(dn, y) < 3¢. 


(ii) Let {a), a2, ...} be dense in X, and {b,, bo, ...} dense in Y. Then for any € > 0 


and any pair : € X x Y,x canbe approximated by some a, such that dx (an, x) < 


€/2, and y by some bm with dy (bm, y) < €/2; then 


d (6 ) ; (;)) = dx (ay, x) + dy(bm. y) <€ 


an 


bin 


(iii) Let f: X — Y be continuous and let A be countable and dense in X. Then fA 
is countable because the number of elements of a set cannot increase by a mapping. 
Moreover, as f is continuous, fA is dense infX (Example 3.8(3)), and fX is separable. 

oO 


shows that the countable set of points ( ) (n,m € N) is dense in X x Y. 


Exercises 4.21 


1. A metric space X is separable when there is a countable number of points a, such 
that the set of balls Be (a,) covers X for any €. 


2. * In a separable space, we can do with a countable number of balls (with say 
rational radii), in the sense that every open set is a countable union of some of 
these. It then follows that every cover of the space using open sets has a countable 
subcover. 


3. The union of a (countable) list of separable subsets is separable. 
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4. p» If there are an uncountable number of disjoint balls, then the space is non- 
separable, e.g. an uncountable set with the discrete metric is non-separable. We 
shall meet some non-trivial examples of non-separable metric spaces later on 
(Theorem 9.1). 


Remarks 4.22 


1. * The proof of Baire’s theorem can be modified to show that the countable union 
of closed nowhere dense sets in a complete metric space X is nowhere dense in 
Xx. 


2. Note that d( f(x), f(v)) < d(x, y) does not necessarily give a contraction map. 
For example, f(x) := 2/(/ x2 + 4—x). In this case, the iteration x74) := f (xn) 
may satisfy d(xn+1, Xn) — 0 but need not be a Cauchy sequence. 


3. The reader has most probably seen images of 
fractals; many of these are the fixed ‘point’, 
or attractor, of a contraction on the space of 
shapes (Example 2.2(4)) (see [19]). pay tga 


4. The Banach fixed point theorem is also valid when f% := f o---o f, rather 
than f, is a contraction map; in this case the convergence is “cyclic”. 


Chapter 5 
Connectedness 


5.1 Connected Sets 


We have an intuitive notion of what it means for a shape to be in one piece. The 
following definition makes this idea precise: 


Definition 5.1 


A subset C of a metric space is disconnected when it can be divided into (at 

least) two disjoint non-empty subsets C = A U B such that each subset is 

covered exclusively by an open set, i.e., 
ACU, BNU=8%, U open, 
BCV, ANV=8, _ V open. 


Otherwise a set is called connected. 


Examples 5.2 


1. Single points are always connected because they cannot be split into two non- 
empty sets. Similarly the empty set is connected. 


2. » Any subset of Z (or any discrete metric space) is disconnected except the 
single points and the empty set. Metric spaces with this property are called totally 
disconnected. 


Proof Let C contain more than one point, say a and b. Take A = U := {a} 
and B = V := C\{a} 4 ©. Then U and V are open (any subset is open) and 
respectively contain A and B exclusively. 
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Kazimierz Kuratowski (1896-1980 Poland) rewrote much of 
Hausdorff’s theory in 1921, introducing his closure axioms and 
connectedness. Similarly Aleksandrov and Urysohn, and later 
Tykhonov, in Moscow, built upon Hausdorff’s work with com- 
pactness. 


Fig. 5.1 Kuratowski 


3. » A set A is connected when every continuous function f: A > {0,1} C Zis 
constant. Otherwise the open sets f~!{0} and f~!{ 1} cover and disconnect A. 


Proposition 5.3 


A set C is connected <= every non-trivial subset of C has a non-empty 
boundary in C that is, 


ZBEACC > AFD. 


Here, 0cA = {x € C: Ve > 0,da € ANC,Ab € CNA, d(a,x) < €, 
d(b,x) <e}. 


Proof Let @ #4 A C C be without a boundary in C. Then all the points of C are 
either interior points or exterior points of A; thus A and B := CA are open in 
C. But then there are open sets U, V in X, wih A = UNC and B = VNC 
(Example 2.12(3)), and 


UVAB=UNCASUNCHNA Se 
(similarly VM A = @),soC = AU(CNA) = AU B iis disconnected. 
Conversely, if C is disconnected, then C = AU B, with A C U, B C V, both 
non-empty, and U, V open sets in X with ANV = @ = BNU.Forany pointa € A, 
a € B,(a) C U; hence 


ae{xeC:d(x,a)<r}=Be(aANCCUNCH=aA 


shows that A is open in C. Similarly B = C\A is open in C, hence A is closed in 
C. This leaves A without a boundary in C. Oo 
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Theorem 5.4 


The connected subsets of R are precisely the intervals. 


Proof Every non-trivial subset of an interval I C R_ has a boundary point: Let A 
be a non-trivial subset of 7; that A is non-trivial means that there exist ag € A and 
bo € INA. We can assume ag < bo, otherwise switch the roles of A and J\A in 
what follows. 

Divide the interval [ao, bo] into halves, [ao, c] A 
and [c, bo], where c := (ag +bo) /2 is the midpoint. dig = bo 
Ifc € A let [ay, bj] := [c, bo], otherwise ifc € AC a, o———-0 
let [a,, bj] := [ag, c]. Continue taking midpoints a2 a P2 
to get a nested sequence of intervals [a,, by] in I, . 
with a, € A, by, € INA. 

By the bisection property (Example 4.3(3)), the sequences (a,) and (b,) are 
Cauchy and asymptotic, and since R is complete, they converge a, — aandb, —> a. 
The consequence is that, inside any open neighborhood B, (a), there are points a, € A 
and b, € INA, making a a boundary point of A. From the preceding proposition, 
this translates as “every interval is connected”. 

Every connected subset C of IR has the interval property a,b € C => 
[a,b] © C: Let C be a connected set, and let a, b € C (say, a < b). Any x € [a, b] 
which is not in C would disconnect C using the disjoint open sets ]—oo, x[ and 
|x, cof. 

Every subset of 'R with the interval property is an interval: Let A have the 
interval property. If A # @, say x € A, and has an upper bound, then it has a least 
upper bound b. The interval [x, b[ is a subset of A because there are points of A 
arbitrarily close to b. Similarly if a is the greatest lower bound then Ja, x] C A. 
Going through all the possibilities of whether A has upper bounds or lower bounds 
or none, and whether these belong to A or not, results in all the possible cases of 
intervals. For example, if it contains its least upper bound b but has no lower bound, 
then [x, b] C A for any x < b, so that A = ]—o0, b]. oO 


By contrast, the connected sets in other metric spaces may be very difficult to 
describe and imagine. Even in R?, there are infinite connected sets such that when 
a single point is removed, the remaining set is totally disconnected! (For further 
information search for “Cantor’s teepee”.) Connectedness is an important intrinsic 
property that a set may have: it is preserved by any continuous function. Even though 
the codomain space may be very different from the domain, a connected set remains 
in ‘one piece’. 
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Proposition 5.5 


Continuous functions map connected sets to connected sets, 

f: X — Y continuous AND C C X isconnected => fC is connected. 
Proof Let C be a subset of X, and suppose fC is disconnected into the non-empty 
disjoint sets A and B covered exclusively by the open sets U and V, that is, 

fC=AUBCULUY, UNB=S=VNA. 


Then, 


C=f'auf'Bc ftuugfitly, ftunf'B=g=flvnofcta. 


fore U 


fqtV V 


Moreover f~!A and f~—! B are non-empty and disjoint, and f~'U and f—!V are 
open sets (Theorem 3.7). Hence fC disconnected implies C is disconnected. oO 


Almost surprisingly, this simple proposition is the generalization of the classical 
“Intermediate Value Theorem” of Bolzano and Weierstra8. In effect, IVT has been 
dissected into this abstract, but transparent, statement and the previous one that 
intervals are connected. 


Proposition 5.6 Intermediate Value Theorem 


Let C be a connected space, and f: C — R a continuous function. For 
any c with f(a) <c < f(b) there exists an x € C such that f(x) =c. 


Proof fC is connected in R and so must be an interval. By the interval property, 
fi@, fb)e fC => ce fC,soc = f(x) for some x € C. oO 
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Exercises 5.7 


1. 


10. 
11. 


Any two distinct points of a metric space are disconnected. More generally, 
(a) any set of N points (N > 2), (b) the union of two disjoint closed sets, are 
disconnected. 


. The space of rational numbers Q is disconnected, e.g. using the open sets 


J—0o, /2[N Q and ],/2, oof N Q. In fact Q is totally disconnected. 


. Suppose that there isan x € X andanr > O such thatd(x, y) €r forall y € X, 


but there are points y with d(x, y) > r. Show that X is disconnected. 


. b> An open set (such as the whole metric space) is disconnected precisely when 


it consists of (at least) two disjoint open subsets. Find a connected set whose 
interior is disconnected. 


. *Any two disjoint non-empty closed sets A and B are completely separated in 


the sense that there are disjoint open sets A CU, B CV, UNV = @. (Hint: 
use Exercise 3.12(17).) 


. >» A path is a continuous function J — X where / is an interval in R. Its 


image is connected. Hence show that the parametric curves of geometry, such as 
straight line segments, circles, ellipses, parabolas, and branches of hyperbolas 
in R2, are connected. 


. (a) The function f(x) := x” is continuous on R, forn = 0,1,.... Show that, 


for any fixedn > 1, x” can be made arbitrarily large. Let y be a positive real 
number; use the intermediate value theorem to show that 2/y exists. More 
generally every real monic polynomial x” + ---+ a,x +o (n > 1), where 
do is negative or when n is odd, has a root. 

(b) Every continuous function f : [0,1] — [0,1] has a fixed point. (Hint: 
consider f(x) — x.) 


. If f: [0, 1}? — R is continuous and f(a) < c < f(b) then there is anx € 


[0, ie such that c = f(x) (assuming [0, if is connected). 


. Suppose X is connected and f: X — Ris continuous and locally constant, that 


is, every x € X has a neighborhood taking the value f(x). Then f is constant 
on X. (Hint: Show a Ff (a) is closed and open in X.) 


Q has non-interval subsets with the interval property (e.g. [0, /2f 1Q). 


Use the intermediate value theorem to show that a 1-1 continuous function on 
[a, b] must be increasing or decreasing 


x<y => f@)<f) OR x<y > f(x) 2 fi). 
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5.2 Components 


It seems intuitively clear that every space is the disjoint union of connected subsets. 
To make this rigorous, let us present some more propositions that go some way in 
helping us show whether a set is connected, especially the principle that whenever 
connected sets intersect, their union is connected; this allows us to build connected 
sets from smaller ones. 


Proposition 5.8 
If C is connected then so is C with some boundary points (in particular C). 


Proof Let D be C with the addition of some boundary points. Suppose it separates 
as D = AU B each covered exclusively by open sets U and V. Then C would also 
split up in the same way, unless C C U say. This cannot be the case, for let x € B 
be a boundary point covered by V. Then there is a ball B,(x) C V containing points 
of C, a contradiction. Thus D disconnected implies C is disconnected. oO 


Theorem 5.9 


If A;, B are connected sets and Vi A; B # @ then BUL), A; is connected. 
If A, are connected for n = 1,2,...,and A, 1 Anyi # @ then LU), An is 


connected. 
eas” 


Proof (i) Suppose the union B U (); Aj is disconnected and splits up into two parts 
covered exclusively by open sets U and V. Then B would split up into the two parts 
BOU and BN V were these to contain elements. But as B is known to be connected, 
one of these must be empty, say B 1 U = @. For any other A := A; that is partly 
covered by U (and there must be at least one) we get AN V = © and A C U, for 
the same reason. But then AM B C UN B = @, contradicting the assumptions. 

In particular, note that if A, B are connected and AM B # @, then A U B is also 
connected. But the statement is true even for an uncountable number of A;. 
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(ii) If Cy := Ly Ay is connected, then Cy+1 = Cy U Ay +1 is also connected by 
the first part of the theorem, since Cy M An+1 4 @. By induction Cy is connected 
for all N. As Ay © Cy for all N, it follows that U_, Cv = U2) An is also 
connected. oO 


The converses of both these statements are false, but the following holds: 


Proposition 5.10 


Given non-empty connected sets A, B, 


AU Bisconnected <= dx ¢ AUB, {x}UA and{ x }U B are connected. 


Proof Suppose no point x € A makes {x } U B connected. That is, for each x € A 
there are two open sets which separate { x }U B. Call the set which contains x, U,, and 
the other one V,. They would also separate B unless B C V,;, and U,; 1 B = ©. So 
U,, Ux is an open set containing A but disjoint from B. If the same were to hold for 
points in B, then there would be an open set containing B but disjoint from A, making 
A U B disconnected. The converse is a special case of the previous proposition. O 


Theorem 5.11 


A metric space partitions into disjoint closed maximal connected subsets, 
called components. Any connected set is contained in a component. 


By a maximal connected set is meant a connected set C such that any A D C 
(A 4 C) is disconnected. 


Proof The relation x ~ y, defined by {x, y} C C for some connected set C, is 
trivially symmetric; itis reflexive since { x, x } = {x } is connected, and it is transitive 
because if x, y € C, and y,z € C2 then x,z € C; UC, which is connected by 
Theorem 5.9 as y € Cy M C2. Moreover, another way of writing the relation x ~ y 
is as 


ye Ltc connected: x € C}, 


so that the equivalence class [x] (called the component) of x is the union of all the 
connected sets containing x. What this implies is that any connected set C that con- 
tains x must be part of the component of x. In addition, the component is connected 
by Theorem 5.9 and it is maximally so, as no strictly larger connected set containing 


x can exist. In particular since [x] is connected it must be the case that [x] = [x] and 
[x] is closed (Proposition 2.16). oO 
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Exercises 5.12 


1. 


Show that R? is connected by considering the radial lines all intersecting the 
origin. 


. >» More generally, if there exists a path between any two points, then the metric 


space is connected. (It is enough to find a path between any point and a single 
fixed point; why?) Such a space is said to be path-connected. 


. The square [0, 1]? and the half-plane Ja, oo[ x R are connected. 


. Any disk in R? is path-connected. Do balls in a general metric space have to be 


connected? Consider the space X := ]—oo, —1[ U]]1, oof and find a ball in this 
space which is not connected. 


. » If X, Y are connected spaces then so is X x Y. 


. The set R*\{ x } is connected. But R\{ x } is disconnected. Deduce that R and 


IR? are not homeomorphic. 


Using the same idea, show that [a, b], [a, b[ and Ja, b[ are not homeomorphic 
to each other, and neither is a circle to a parabola. 


. A connected metric space, such as R, has one component, itself. At the other 


extreme, in totally disconnected spaces, the components are the single points 
{a}, e.g. Q and Z. 


. Ifa subset of X has no boundary (so is closed and open) then it is the union of 


components of X. 


. Components need not be open sets (e.g. in Q). 


10. 


A metric space in which B,;(x) is connected for any x and any r sufficiently 
small is said to be locally connected. Show that for a locally connected space X, 


(a) the components are open in X, 
(b) any convergent sequence converges inside some component, 


(c) if X is also separable, then the components are countable in number. 


Chapter 6 
Compactness 


6.1 Bounded Sets 


Definition 6.1 


A set B is bounded when the distance between any two points in the set has 


an upper bound, 


— di 
arp=0, Wawel, oben) Sia Sea B 


The least such upper bound is called the diameter of the set: | 


diamB := sup d(x, y). 
x,yeB 


In everyday, but not very helpful, terms one can say that a bounded set does not 
“reach to infinity”, or even that it is “finite” in a geometric sense (the unit circle has 
an infinite number of points but is bounded in R*). The characteristic properties of 
bounded sets are 


Proposition 6.2 


Any subset of a bounded set is bounded. 
The union of a finite number of bounded sets is bounded. 


Proof (i) Let B be a bounded set with d(x, y) < r for any x, y € B. In particular 
this holds for x, y in any subset A C B, so A is bounded. 


(ii) Given a finite number of bounded sets By,...,By, with diameters r1,...,717, respec- 
tively, let r := max, r,,. Pick a representative point from each set, a, € B,, and take 
the maximum distance between any two, 7 := Maxm,n (dm, An); it certainly exists 
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as there are only a finite number of such pairs. Now, for any two points x, y € U,, Bn, 
that is, x € B;, y € B;, and using the triangle inequality twice, 


d(x, y) < d(x, aj) + d(aj, aj) + d(aj, y) an 
Snr +r; 
<2r+y7, 


am 
an upper bound for the distances between points in 


N . 
U,,<1 Bn is found. re 


Examples 6.3 


1. In any metric space, finite subsets are bounded. In N, only the finite subsets are 
bounded (since d(ag, a,) < N for all n implies n < N). Consequently, N, Q, R, 
and C are all unbounded. 


2. Inadiscrete metric space, every subset is bounded. A metric space may be “large” 
(non-separable) yet be bounded. 


3. » Aset Bis bounded < it is a subset of a ball, 
dr >0, dae X, BC B,(a). 
Proof Balls (and their subsets) are obviously bounded, 
Vx,y€ B,(a), d(x,y) <d(x,a)+dly, a) < 2r. 


Conversely, if a non-empty set is bounded by R > 0, pick any points a € X and 
b € B toconclude x € B,(a): 


Vx € B, d(x,a) < d(x,b)4+d(b,a) < R+1+d(b,a) =: r. 


4. The set [0, 1[U]2, 3[C R is bounded because it can be covered by the ball B3(0), 
or because it is the union of two bounded sets. 


5. » Boundedness is not necessarily preserved by continuous functions: If B is 
bounded and f is a continuous function, then fB need not be bounded. Worse, a 
set may be bounded in one metric space X, but unbounded in a homeomorphic 
copy Y. 

For example, N with the standard metric is unbounded, but its homeomorphic 
copy, N with the discrete metric, is bounded. 
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Exercises 6.4 
1. The set [—1, 1[ is bounded in R with diameter 2; in fact diam[a, b[= b — a. 


2. Show that if diam(A) < r, diam(B) < s, and assuming AM B ¢ © then 
diam(A UB) <r+s. 


3. Any closed ball B,[a] := {x : d(x,a) < r} is bounded; hence the closure of a 
bounded set is bounded. 


4. » Cauchy sequences are bounded (Example 4.3(5)). So unbounded sequences 
cannot possibly converge. 


5. » Prove that Lipschitz functions map bounded sets to bounded sets (Exercise 
4.17(3)). So equivalent metric spaces have corresponding bounded subsets. 


6.2 Totally Bounded Sets 


We have seen that boundedness is not an intrinsic property of a set, as it is not 
necessarily preserved by continuous functions. Let us try to capture the “finiteness” 
of a set with another definition: 


Definition 6.5 


A subset B C X is totally bounded when it can be covered by a finite number 
of €-balls, however small their radii €, 


N 
Ve > 0, IN EN, Jay,...,ayEX, BC |) Be@). 
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Easy Consequences 


1. Any subset of a totally bounded set is totally bounded (the same €-cover of the 
parent covers the subset). 


2. A finite union of totally bounded sets is totally bounded (the finite collection of 
€-covers remains finite). 


3. A totally bounded set is bounded (it is a subset of a finite number of bounded 
balls). 
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Examples 6.6 


1. The interval [0, 1] is totally bounded in R because it can be covered by the balls 
B.(ne) forn = 0,...,N, where 1/e -1<N< l1/e. 


2. Not all bounded sets are totally bounded. For example, in a discrete metric space, 
any subset is bounded but only finite subsets are totally bounded (take € < 1). 


3. » A totally bounded space X is separable. 


Proof For eachn = 1, 2, ..., consider finite covers of X by balls By/n(aj,n) and 
let Ay, := {aj,n} be the finite set of the centers, so A := bear A, 1s countable. 
For any € > 0 and any point x € X, letn > 1/e, then x is covered by some ball 
By jn (din), i-€., d(x, din) < €, thus A = X. 


4. The center points a, of the definition may, without loss of generality, be 
assumed to lie in B. Otherwise cover B with balls Be/2(xn), and take repre- 
sentative points d, € BM Be/2(xn) whenever non-empty; then OF Be(an) D 
BN Le Bej2(Xn) > B. 


Proposition 6.7 


A uniformly continuous function maps totally bounded sets to totally 
bounded sets. 


Proof Let f: X — Y bea uniformly continuous function, 
Ve > 0, 35 >0, Vx EX, fBs(x) C Be(f(x)). 


Let A be a totally bounded subset of X, covered by a finite number of balls A C 
UN_, Bs(in). Then 


N N 
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A totally bounded set is geometrically ‘finite’, so an infinite sequence of points 
in a totally bounded set is caged in, so to speak, with nowhere to escape to: 


Theorem 6.8 


A set B is totally bounded = 
Every sequence in B has a Cauchy subsequence. 
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Proof Let the totally bounded set K be covered by a finite number of balls of radius 
1, and let {x;, x2, ...} be an infinite subset of K. (If K is finite, a selected sequence 
must take some value x; infinitely often and so has a constant subsequence.) A finite 
number of balls cannot cover an infinite set of points, unless at least one of the balls, 
B (a1), has an infinite number of these points, say {x1,1, 2,1, ...}- 

Now cover K with a finite number of 5-balls. For the same reason as above, at 
least one of these balls, By/2(a2) covers an infinite number of points of {xp ,1}, say 
the new subset {x1,2, x2,2, .. .}. Continue this process forming covers of + -balls and 
infinite subsets {Xn,m} Of Bijm(dm). The sequence (x71) is Cauchy, since form <n, 
both xXim,m and Xp.) are elements of the set {X1,m,X2,m,--.}, and so d(Xn.n,Xm,m) < 
2 —> Oasn,m—> ~. 

For the converse, start with any a; € A. If B.(a;) covers A then there is a single- 
element €-ball cover. If not, pick a2 in A but not in B,. (a). Continue like this to 
get a sequence of distinct points a, € A with a, ¢ oa, B.(a;), all of which are 
at least € distant from each other. This process cannot continue indefinitely else we 
get a sequence (a,,) whose points are not close to each other, and so has no Cauchy 
subsequence. So after some N steps we must have A C ee Be (aj). oO 


Exercises 6.9 


1. » If X and ¥Y are totally bounded metric spaces, then so is X x Y. 
(Hint: If Be(xn) (29 = 1,..., N) cover X and Be(ym) (m = 1,..., M) cover Y, 
show that every point (x, y) € X x ¥ liesin Bo (x;, yj) forsomei < N, j < M.) 
2. » In RX (and CY), aset is bounded < it is totally bounded. 
(Hint: Show that if B is a bounded set in RY, with a bound R > 0, then Bisa 
subset [—R, R],, which is totally bounded by the previous exercise.) 
3. The set of values of a Cauchy sequence is totally bounded. 


4. The closure of a totally bounded set is totally bounded. 


5. Let B C Y C X, then B is totally bounded in Y < it is totally bounded in X 
(Example 2.12(3)). 


6. Any bounded sequence in R% (or C’) contains a convergent subsequence. (Hint: 
Xn € [—R, R]" is totally bounded.) 


7. A continuous function f: X — Y, with X, Y complete metric spaces, maps 
totally bounded subsets of X to totally bounded subsets of Y. (Hint: consider a 
sequence in f B for a totally bounded set B C X.) 


6.3 Compact Sets 


In the presence of completeness, continuous functions preserve totally bounded sets. 
Alternatively, we can strengthen the definition of boundedness even further to a 
property that is preserved by continuous functions; such a property is compactness, 
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but it will emerge that compact sets are precisely the complete and totally bounded 
subsets. 


Definition 6.10 


A set K is said to be compact when given any cover of balls (of possibly 
unequal radii), there is a finite sub-collection of them that still cover the set (a 
subcover), 


N 
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Examples 6.11 


1. Any finite set, including @, is compact. 


2. The subset [0, 1[C R is totally bounded but not compact. For example, the cover 
using balls By—1/,(0) forn = 2,... has no finite subcover. On the other hand, 
we will soon see that the closed intervals [a, b] are compact. 


3. » Compact metric spaces are totally bounded, and so bounded and separable 
(consider the cover by all €-balls). Thus, IR and N are not compact. 


An equivalent formulation of compactness is the following. By an open cover is 
meant a cover consisting of open sets, K C (); Aj (Aj open subsets of X). 


Proposition 6.12 


A set is compact < any open cover of it has a finite subcover. 


Proof Let open sets A; cover a compact set, K C ), A;. Each open set A; consists 
of a union of balls. It follows that K is included in a union of balls. By the definition 
of compactness, there is a finite number of these balls Be, (a1), ..., Bey (an) that still 
cover the set K .. Each of these balls is inside one of the open sets, say Be; (aj) © Aj;, 
and 


as claimed. 
Conversely, suppose K is such that any open cover of it has a finite subcover. This 
holds in particular for a cover of (open) balls, so K is compact. Oo 


6.3 Compact Sets 71 


We will soon strengthen the following proposition to show that compact sets are 
complete, but the following proof is instructive, and remains valid in more general 
topological spaces: 


Proposition 6.13 


Compact sets are closed. 


Proof Let K be compact and x € X\K. To show x is exterior to K, we need to 
surround it by a ball outside K. We know that x can be separated from any y € K 
by disjoint open balls B,, (x) and B,, (y) (Proposition 2.5). Since y € B,, (y), these 
latter balls cover K. But K is compact, so there is a finite sub-collection of these 
balls that still cover K, 


KC Br, (y) U+-U Bry (yn). 
Now let r := min{rj,..., ry}; then B-(x) N K = @ since 


ze B(x) > 2€B, (x) fori=l,...,N 
=> Z€ Br (yi) U---U By (yn) 2 K. 
Therefore, x € B,(x) C XNK. oO 


Proposition 6.14 


A closed subset of a compact set is compact. 
A finite union of compact sets is compact. 


Proof (i) Let F be aclosed subset of a compact set K, and let the open sets A; cover 
F; then 
K C FU(XNF) C| J Aj U(XNF). 


l 


The right-hand side is the union of open sets since X\ F is open when F is closed. 
But K is compact and therefore a finite number of these open sets are enough to 
cover it, 


N N 
K CUA; UCSF), $0 Fc\|J4. 
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(ii) Let the open sets A; cover the finite union of compact sets Kj U---U Ky. Then 
they cover each individual K,,, and a finite number will then suffice in each case, 
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Ky CS Let, Ai,.Forn = 1,..., N, the collection of chosen Aj, remains finite, and 
together cover all the Ky. oO 


Compactness is strong enough that it is preserved by continuous functions; it is 
thus a truly intrinsic property of a set, as any homeomorphic copy of a compact set 
must also be compact. 


Proposition 6.15 


Continuous functions map compact sets to compact sets, 


f: K © X — Y continuous AND K compact => /fK compact. 


Proof Let the sets A; be an open cover for fK, 
SKC U Aj 
i 


From this can be deduced 


Ref "| A= | | fas 


But f—!A; are open sets since f is continuous (Theorem 3.7). Therefore the right- 
hand side is an open cover of K. As K is compact, a finite number of these open sets 
will do to cover it, 


N 
oe WO dae 


k=1 


It follows that there is a finite subcover, fK C U Pe | Ai,, aS required to show fK 
compact. oO 


To summarize, 


Continuous functions preserve compactness, 
Uniformly continuous functions preserve total boundedness, 
Lipschitz continuous functions preserve boundedness. 


An immediate corollary is this statement from classical real analysis: 
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Corollary 6.16 


Let f: K — R bea continuous function on a compact space K. Then its 
image f K is bounded, and the function attains its bounds, 


ab@, an IK, Vie Gi, (Go) < 7G) S Ca) 


Proof Theimage f K is compact, and so bounded, fK C Br(0),i.e.,| f(x)| < R for 
all x € K.Moreover compact sets are closed and so contain their boundary points. In 
particular f K contains inf f K and sup f K (Example 2.8(3)), e., inf fK = f (xo), 
sup fK = f (x1) for some xo, x1 € K. oO 


A property that holds locally, i.e., in a ball around any point, will often also hold in 
a compact set by using a finite number of these balls. As an example of this, consider 
a continuous function with compact domain. By the definition of continuity, any x 
in the domain is surrounded by a small ball Bs, (x) on which the function varies by 
at most a small fixed amount €; on a compact domain, a finite number of these balls 
and radii suffice to cover the set, so a single 6 can be chosen irrespective of x. More 
formally, 


Proposition 6.17 


Any continuous function from a compact space to a metric space, 
f: K — Y, is uniformly continuous. 


If, moreover, f is bijective, then f is a homeomorphism. 


Proof (i) By continuity of f, every x € K has a4, for which f Bs, (x) C Be(f(x)) 
(Theorem 3.7). Since the balls Bs,/2(x) cover K, there is a finite subcover, from 
which can be chosen the smallest value of 5. Let a,b € K be any points with 
d(a, b) < 6/2. The point a is covered by a ball Bs, 2(x) from the finite list. Indeed, 
Bs, (x) covers b too since 


d(x, b) < d(x, a) +d(a, b) < 5/2 + 8/2 < 8. 


As both a and b belong to B;,(x), their images under f satisfy f(a), f(b) € 
Be(f (x)), so that 


d( f(a), f(b)) < d( f(a), f(x) + d( f(x), f(b) < 2e. 
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This inequality was achieved with one 6 independently of a and b, so f is uniformly 
continuous. 


(ii) If f is continuous and onto, Y = f K is compact. But when in addition it is also 
1-1, it preserves open sets: if A is open in K, then K A is closed, hence compact, 
in K; this is mapped 1-1 to the closed compact set f(K\A) = Y\ fA, implying 
that f A is open in Y. This is precisely what is needed for f—! to be continuous, and 
thus for f to be a homeomorphism. oO 


We are now ready for some concrete examples, starting with that of R, the simplest 
non-trivial complete space. 


Proposition 6.18 Heine-Borel’s theorem 


The closed interval [a, b] is compact in R. 


Proof Let J; Ai > [a, b] be an open cover of the closed interval. We seek to obtain 
a contradiction by supposing there is no finite subcover. One of the two subintervals 
[a, (a+b)/2] and [(a+b)/2, b] (and possibly both) does not admit a finite subcover: 
call it [a;, b;]. Repeat this process of dividing, each time choosing a nested interval 
[an, bn] of length (b — a)/2” which does not admit a finite subcover. 

Now (a,,) and (b,) are asymptotic Cauchy sequences, which must therefore con- 
verge to the same limit, say, a, — x and b, — x (Proposition 4.2 and Theorem 4.5). 
This limit x is in the set [a, b] (Proposition 3.4) and is therefore covered by some 
open set A;,. As an interior point of it, x can be surrounded by an e-ball (in this case, 
an interval) 

x € Be(x) C Ain. 


But a, — x and b, — x imply that there is an N such that ay, by € B.(x), and 
so [an, by] © Be(x) © Aj,. This contradicts how [ay, by] was chosen not to be 
covered by a finite number of A;’s, so there must have been a finite subcover to start 
with. oO 


The Heine-Borel theorem generalizes readily to arbitrary metric spaces. 


Theorem 6.19 


A set K is compact <= K is complete and totally bounded. 


Proof Compact sets are totally bounded: Let K be a compact set. For any € > 0, 
cover K with the balls B.-(x) for all x € K. This open cover has a finite sub-cover. 
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Compact sets are complete: Let (xn) be a Cauchy sequence which has no limit in 
K, so that for each x € K, 


de > 0, VN, In [>N, d(xXn, x) >. 


For this € (which may depend on x), 


4M, nam 2M > d(x, Xm) < €/2, 
€ < d(Xn,X) < d(Xn, Xm) + d(Xm, xX) < €/2 + d(Xm, x), 
m2M > d(Xm, x) > €/2. 


Form < M, the distances d(x,,, x) take only a finite number of values. Hence, for 
each x € K, there is a small enough ball B,(,)(x) which contains no points x, unless 
Xn = x. This gives an open cover of K, which must have a finite sub-cover. But this 
implies that the sequence takes a finite set of values and so must eventually repeat 
and converge (Exercise 4.10(4)). In any case, there must be a limit in K. 


Complete and totally bounded sets are compact: Let K be a complete and totally 
bounded set. Suppose it to be covered by open sets V;, but that no finite number of 
these open sets is enough to cover K. Since K is totally bounded, 


N 


KCL Bi) 
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for some y; € K (Example 6.6(4)). If each of these balls were covered by a finite 
number of the open sets V;, then so would K. So at least one of these balls needs an 
infinite number of V;’s to cover it; let us call this ball By (x1). 

Now consider Bj (x1) K, also totally bounded. Once again, it can be covered by 
a finite number of balls of radius 1/2, one of which does not have a finite subcover, 
say B,/2(x2). Repeat this process to get a nested sequence of balls By 2» (x,), with 
Xn € K, none of which has a finite subcover. The sequence (x,,) is Cauchy since 
d(Xn, Xm) < 1/2” (form >n), and K is complete, hence x, — x in K. 

But x is covered by some open set V;,. Therefore there is an € > O such that 


x EBX) S Vig: 


Moreover since 1/2” — 0 and x, — x, an N can be found such that 1/2" < €/2 
and d(xy,x) < €/2, so that for d(y, xy) < io, 


d(y,x) < dy, xn) + d(xy, x) <e€ 
Le., By jn (xn) SC B(x) Vio. 


which contradicts the way that the balls By 2» (x,) were chosen. oO 
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Corollary 6.20 


In a complete metric space, a subset K is compact 
<> K is closed and totally bounded. 


In R", K is compact + K is closed and bounded. 


Proof In a complete metric space, a subset is complete if, and only if, it is closed 
(Proposition 4.7). 

In the complete space RY, a set is totally bounded if, and only if, it is bounded 
(Exercise 6.9(2)). Note carefully that this remains true whether the distance is Euclid- 
ean, D;, or Dx (Example 2.2(6)). oO 


Theorem 6.21 Bolzano-Weierstraf property 


In a metric space, a subset K is compact 


<> every sequence in K has a subsequence that converges in K 
<> every infinite subset of K has a limit point in K. 


Proof (i) A compact set is totally bounded, and so every sequence in it has a Cauchy 
subsequence (Theorem 6.8). But compact metric spaces are also complete, implying 
convergence of this subsequence in K. 


(ii) Let A be an infinite subset of K, and select a sequence of distinct terms a), a2, ... 
in A. Assuming that every sequence in K has a convergent subsequence, then an, > 
a € K,asi — o. For any ball B,(a), there are an infinite number of points 
Gn; € Be(a), making a a limit point of A (a can be equal to at most one of these 
distinct points). Thus K satisfies the Bolzano-Weierstraf property that every infinite 
subset has a limit point in K. 


(iii) Let K have the Bolzano-Weierstrai property, let (x,) be any sequence in K 
and let A be the set of its values {xo, x1, x2, ...}. If A is infinite, then it has a limit 
point x € K and so there is a convergent subsequence x, — x with x, € A 
(Proposition 3.4). Otherwise, if A is finite, one can pick a constant subsequence. In 
either case there is a (Cauchy) convergent subsequence in K. 

This shows, firstly, that K is totally bounded, and secondly, that every Cauchy 
sequence in K converges in K (Exercise 4.10(10)), that is, K is complete. oO 


Exercises 6.22 
1. A compact set that consists of isolated points is finite. 
2. In Z, and any discrete metric space, the compact subsets are finite. 


3. Show that [0, 1] N Q is closed and totally bounded in Q but not compact. (Hint: 
first show that [0, r[NQ is not compact when r is irrational.) 


6.3 
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. (Cantor) Let K, be a decreasing nested sequence of non-empty compact sets. 


If An K, = © then X\ Ky, (n = 2,3,...) form an open cover of K;. Deduce 
that (),, K, is compact and non-empty. Moreover, if diamK, — 0 then (),, Kn 
consists of a single point. 


. The Cantor set is compact, totally disconnected, and has no isolated points 


(Exercise 2.20(7)). (In fact, it is the only non-empty space with these proper- 
ties, up to homeomorphism.) 


. The least distance between a compact set and a disjoint closed subset of a metric 


space is strictly positive. 


. Suppose K is acompact subset of R? which lies in the half-plane {(x, y) : x > O}. 


Show that the open disks with centers (x + , ae 0) and radii x > | cover the 
half-plane, and deduce that K is enclosed by a circle that does not meet the 
y-axis. 


. The circle $! is compact; more generally, any continuous path [0, 1] > X has 


a compact image. 


. Show that there can be no continuous map (i) S$ ! _, [0, 2x[ which is onto, or 


(ii) S! — R which is 1-1. 


A continuous function f: R2 — R takes a maximum, and a minimum, value 
on a continuous path y : [0, 1] > R?. For example, there is a maximum and a 
minimum distance between points on the path and the origin. Give an example 
to show that this is false if [0, 1] is replaced by ]0, 1]. 


If f: X — K is bijective and continuous, and K is compact, it does not follow 
that X is compact. Show that the mapping f(@) := (cos 6, sin@) for0 < 6 < 27, 
is a counter-example. 


Generalize the Heine-Borel theorem to closed rectangles [a, b] x [c, d] in R?, by 
repeatedly dividing it into four sub-rectangles and adapting the same argument 
of the proof. Can you extend this further to RY ? 


> The spheres and the closed balls in R" are compact. 


Verify that [a, b] OQ is not compact by finding an infinite set of rational numbers 
in [a, b] that does not have a rational limit point. 


Let f: RY — RY be a continuous function; consider the following iteration 
Xn41:= f Xn)/|f Xn)| of mapping by f and normalizing. Show that there is a 
convergent subsequence (one for each limit point), assuming f(x,) 4 0. 


p> If X, Y are compact metric spaces then so is X x Y. 


It is instructive to find an alternative proof that a continuous function maps a 
compact set to a compact set, using the BW property. 
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Karl Weierstraf8 (1815-1897) After belatedly becoming a sec- 
ondary school mathematics teacher at 26 years, he privately 
studied Abel’s subject of integrals and elliptic functions, until 
in 1854 he wrote a paper on his work and was given an hon- 
orary degree by the University of Kénigsberg. He then became 
famous with his programme of “arithmetization” based upon 
the construction of the real numbers, and of a function that is 
continuous but nowhere differentiable. 


Fig. 6.1 Weierstraf 


6.4 The Space C(X, Y) 


We are now ready to turn the set of continuous functions f: [0,1] — C into a 
complete metric space C[0, 1], thereby giving one precise meaning to f, — f. To 
appreciate the difficulty involved, note that if we were to define f, — jf to mean 
pointwise convergence, that is, fn(x) — f(x) for all x € [0, 1], then we would 
get an incomplete space: The polynomials x” converge pointwise to a discontinuous 
function. In fact we will consider the more general case of bounded functions from 
any set to a metric space. A bounded function is one such that im f is bounded in the 
codomain Y, that is, 


dr >0, Va,be X, dy(f(a), f(b) <r. 


Theorem 6.23 


The space of bounded functions from a set X to a metric space Y is itself 
a metric space, with distance defined by 


d(f, g) := sup dy (f(x), g()), 
xex 


which is complete when Y is. 


It contains the closed subspace C;(X, Y) of bounded continuous functions, 
when X is a metric space. 


Proof Distance: The distance is well-defined because if im f and img are bounded, 
then so is their union, and dy (f (x), g(x)) < diam(Gim f Uim g) for all x € X. 
That d satisfies the distance axioms follows from the same properties for dy; 


d(f,g)=0 @ Wx eX, dy(f(x), g(x)) = 0 
o Wxre X, f(x) = g(x) 
o f=9, 
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af g= oup dy (f(x), g(x) 
< ey (dy (f (x), A(x) + dy (h(x), g(2))) 
< sup dy (f (x), A(x)) + sup dy (h(x), g(x)) (Exercise 3.5(7b)) 
= ve h) + d(h, g). 


The axiom of symmetry d(g, f) = d(f, g) is easily verified. 


Completeness: Let f, : X —> Y be a Cauchy sequence of bounded functions, 
then for every x € X, 


dy(fn(*), f(x) < dfn, fn) > 0, as n,m — oo 


so (fn(x)) is a Cauchy sequence in Y. When Y is complete, f,,(x) converges to, 
say, f (x). 

Normally, this convergence would be expected to depend on x, being slower for 
some points than others. In this case however, the convergence is uniform, as it is 
d(fns fm) = sup, dy(fn(x), fm(x)) which converges to 0. So given any € > 0 
there is an N, such that dy(f,(x), fin(x)) < €/2 for any n,m > WN and any 
x € X. For each x, we can choose m > N, dependent on x and large enough 
so that dy (fin(x), f(x)) < €/2, and this implies 


Wx €X, dy(fn(x), fx) < dy (fn(*), f(x) + dy (fm(x), f(x) <€ (6.1) 


for any n > N. Since this N is independent of x, it follows that d(f,, f) — 0. 
The function f is bounded because for any x, y € X, using (6.1), 


dy (f(x), f(y) < dy(f@), fu(x)) + dy (fu), fu (y)) + dy (fy), FY) 
<e+Rpy +e (6.2) 


with N independent of x and y, where R fy is the diameter of im fy. 


Cp(X, Y) is closed: If X is a metric space and f,, are continuous, then this same 
inequality (6.2) shows that f is also continuous: if 5), is small enough, then 


dx (x,y) <6n = dy(fn(x), fu(y)) <€ 
=> dy(f(x), f(y)) < 3e, 


so that fr > f € Co(X, Y). oO 


Often we write C(X) for the complete metric space Cp(X, C). 


80 6 Compactness 


The convergence f, — f in Cp(X, Y) is called uniform convergence. It is much 
stronger than pointwise convergence f,(x) > f(x),Vx € X; since || fn — f|| = 
sup, | fn(x) — f(x)| 1s decreasing to 0, f, approximates f for large n at all values 
of x uniformly. 

Recall that continuous functions on a compact domain are uniformly continuous 
(Proposition 6.17). Thus any ball of a fixed radius 6 is mapped by a real-valued 
continuous function f into a ball of radius €. So, if [a,b] C R is partitioned into 
intervals [x;, x; + 6[, then f maps each into an interval of length at most 2€. Letting 
f take a constant value f(x;) on each interval gives a uniform approximation by a 
“step” function. Of course, step functions are usually discontinuous. We can improve 
the approximation by constructing a function consisting of straight-line segments 
from one end-point (x;, f(%;)) to the next (x; + 6, f(x; + 4). In fact, extending this 
idea further, one can find quadratic or cubic polynomial fits, called “splines” that are 
widely used to approximate real continuous functions, but these spline polynomials 
do not normally join up together as a single polynomial on [a, b]. Such a line of 
argument does give a valid proof that C[a, b] is separable; in fact one can even 
generalize it to show that C(K) is separable whenever K is a compact metric space. 
Stone’s theorem goes further and shows that the complex-valued functions on any 
compact subset K of C can be approximated by polynomials on K. 


Theorem 6.24 Stone—WeierstraB 


The polynomials (in z and Z) are dense in C(K), when K C C is compact. 


Proof The proof is in five steps. The first two steps show that if a real-valued function 
f € C(K) can be approximated by a polynomial p, then another polynomial can be 
found that approximates | f|. Since the maximum of two functions max(f, g) can be 
written in terms of | f — g|, itcan also be approximated by polynomials if f and g can. 
The fourth step, which is the main one, shows how a piecewise-linear approximation 
of f € C(K) can be written in terms of max and min. Together these steps prove 
that the polynomials R[x, y] are dense in the space of real continuous functions on 
K. The final step extends this to complex-valued continuous functions. 


(1) There are real polynomials that approximate |x| on —1 < x < 1: For example, 
let gi (x) := x7, qo(x) := 2x? — x4, ..., defined iteratively by 


Qn1(X) = Qn(x) + (x? — gn(x)’), starting from go(x) := 0. 


Let Yn ‘= gn(x) for brevity, where 0 < x < 1. Notice that 
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Ynt1 — X = Yn —X — (Yn —X)(Yn + X) 
= (Yn — xX) — x — yp). 


2 


When |yn — x| < |y1 — x| = x — x°, we get 


(43° <4 2 ae = 1 
=> —-x<1l-—x-y<1-x 
=>  |Ynt1— x1 <clyn — x| 


where c := max(x, | — x) < 1. By induction, it follows that asn > ov, 
tax =x) < "lt = x| > 0. 


The special cases x = 0 and x = | converge immediately to 0 and | respectively, 
while q,(x) — |x| when x € [—1, O[ by the symmetry of the expression in the 
definition of gn. 

Moreover, the convergence is uniform in x (certainly for0 < x <¢€and1—e < 
x < 1, but for the other positive values of x it takes at most —2 log €/e iterates for 
lt = 2| SO" * < €). 
(11) Let f € C(K, R) (f #0) withc := maxyex | f(x)| > 0 (Corollary 6.16). Then 
the scaled function F := f/c takes values in [—1, 1] so |F| can be approximated 
by q o F, where the polynomial g approximates the function x +> |x| on [—1, 1]. If 
the polynomial p approximates /, it can be expected that the polynomial cq 0 (p/c) 
ought to approximate | f| on C(K). This indeed holds since q is uniformly continuous 
on [—1, 1], 


Ve > 0, 45 >0, Vx € K, |F(x) -—al <6 => |¢qo F(x) -—q(a)| <€ 
so writing P := p/c, 
d(f, p) <céd => d(F, P) <6 
>d(qoF,qoP)<e 
= d(|Fl,qoP) <d(|Fl,qoF)+d(qoF,qo P) 
< d(|x|, Ycjo,1) + d(q 0 F, go P) < 2 
=> d(\fl,cqoP) < 2ce. 


(iii) For real functions, define max(f, g)(x) := max(f(x),g(x)) as well as 
min(f, g)(x) := min(f (x), g(x)); a short exercise shows that 


max(f,g)=(ft+g+|f—g)/2, min(f,g) =(f +9—If — gl)/2. 
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But if f and g can be approximated by polynomials, then so can their sum and 
difference, and by (ii), also | f — g|, and hence max(f, g) and min(f, g). 


(iv) The real polynomials are dense among the real continuous functions C(K, R): 
Let f € C(K, R); for any z 4 win K, there is a linear function (a polynomial) pz 
which agrees with f at the points z, w, Le., pzy(z) = f(z), Pzwlw) = fw). 

For a fixed z, let 


Uzw = {a K: pzw(a) < f(@) +e} = (Ff — paw) 1-¢, oo 


a non-empty open set (since f — p, yj is continuous 
and U,, contains z). As w € Uz, y, we have K © 
aay Uz; but K is compact so it can be covered 
by a finite number of subsets of this open cover, K = 
Uzw,U+:-UUz wy. Let gz = min(pzw,,---. Pzwy) 
< f +6; itis continuous and can be approximated by 
polynomials, from (iii). 


Now let 
V,:= {ae K: g,(a) > f(a)—«} = (f — g) ']-~, ef 


anon-empty open set ( f —g, is continuous, and z € V,). Once again, K C L), Vz, and 
soK = V,, U---UV,,. Leth := max(gz,,.-., 9zy), a continuous function which 
can be approximated by polynomials, since g,, can. Furthermore f —€ <h < f +e; 
and as this holds uniformly in z, we have d(f, h) < e. 


(v) The set of polynomials in z and z is dense in C(K): If f € C(K) is complex- 
valued, then it can be written as f = u + iv with u, v real-valued and continuous, 
that can be approximated by real polynomials p, q, say. Then, 


Vz eK, |(p(z) +ig(z)) — (u(z) +iv(z))| < |p(z) — uz) + |q(z) — v(z)| 
=> d(pt+ig,u+iv) <d(p,u)+d(q, v) 


shows that p + ig approximates f. But is, say, x?y + i(x? — xy”) a polynomial in 
z? Not necessarily: for example, take the polynomial x itself and suppose Re(z) = 
xX = Amz" +++: + anz" with ay, 4 0 being the first non-zero coefficient; then 


am = lim,-+0 ue but Re(z)/z” can be made real or imaginary, so a, = 0, a 
contradiction. Nevertheless, writing x = (z + z)/2 and y = (z — Z)/2i shows that 
every polynomial p(x, y) + ig(x, y) is a polynomial in z and z. Oo 


The last theorem in this chapter characterizes the totally bounded sets of the space 
C(K, Y) of continuous functions on a compact space K (in this case, C(K, Y) = 
Cp(K, Y)), in terms of an explicit property of families of functions: 
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Marshall Stone (1903-1989) studied at Harvard under Birkhoff 
(1926), with a thesis on ordinary differential equations and or- 
thogonal expansions (Hermite, etc.). He then worked on spec- 
tral theory in Hilbert spaces, obtaining his big breakthrough in 
1937 when he generalized the Weierstrass approximation theo- 
rem, which led him to the Stone-Cech compactification theory. 


Fig. 6.2. Stone 


Definition 6.25 


A set F C C(X,Y) of continuous functions on metric spaces is said to be 
equicontinuous when 


Ve > 0,56 > 0, Vf € F,Vx,x' € X, d(x,x') <6 => d(f(x), f(x’) <e. 


The equi in equicontinuous refers to the fact that 5 is independent of f € F. 


Theorem 6.26 Arzela-Ascoli 


Let K and Y be metric spaces, with K compact. Then 


F CC(K, Y) is totally bounded <= F XK is totally bounded in Y and F is 
equicontinuous. 


Here F'K denotes the set {f(x): f € F,x € K}. 


Proof (i) Let F be a totally bounded subset of C(K, Y). This means that for any 
€ > O, there are a finite number of continuous functions f),..., f, € F that are 
close to within € of every other function in F’. 


FK is totally bounded: Let € > 0 be arbitrary. Each f; K is compact (Proposi- 
tion 6.15), so U_, fiK is totally bounded (Proposition 6.14 and Theorem 6.19), 
and covered by a finite number of balls Be(y;), 7 = 1,...,m. This means that for 
every x € K andi = 1,...,n, f;(x) is close to some y; € Y. Combining this with 
the fact that any function f € F is close to some fj, gives 


Af (x), uj) <a FO), fia) +d fi), yj) < 2. 


Thus each f(x), where f € F and x € K, is close to some y; (j depends on x and 
J), in other words the finite number of balls Bz¢(y;) cover FK. 
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F is equicontinuous: We have seen previously that functions f € C(K), in par- 
ticular f;, are uniformly continuous (Proposition 6.17): each € > O gives parameters 
6;. But we can say more. Since there are only a finite number of the functions f;, the 
minimum 6 := min; 6; can be chosen such that 


Ve > 0, 46> 0, Vi, Vx,x' EK, d(x,x') <8 > d(fi(x), fi’) <€. 
But indeed this works for any f € F: 


Ve > 0, 35 >0, Vf ¢ F, Vx, x/€ K, d(x,x') <5 => 
d(f (x), f(x) <d(f@), fi®) +d fi, fi@’) +d fi’), f@’)) < 3e. 


(11) Let F K be totally bounded and F be equicontinuous. Then F'K can be covered 
by a finite number of balls Be(y;), j = 1,...,m, Le., any value f(x) for f € F 
and x € K is close to some yj; to within €. ‘F is equicontinuous’ means that for 
any € > 0, the distance d( f(x), f(x’)) < € for any f € F, whenever x and x’ are 
sufficiently close together to within some 6 > 0 that does not depend on x, x’, or f. 
We also require that K is totally bounded, so that it can be covered by a finite number 
of balls of diameter 5. By removing any overlaps between the balls, we can replace 
them by a finite partition of subsets B;,i = 1,...,, each of diameter at most 6. 

For any f € F and x ¢€ B;, f(x) is close to some y;, d(f (x), yj) < €. Indeed, 
for any other x’ € B;, we have 


dF), Yj) < A( FO"), f@)) + d(F(), yj) < 2€, 


because d(x, x’) < 6 and F is equicontinuous. In other words, the function f maps 
each B; into a ball B2<(y;) (j depending on i), and the whole partitioned space K 
into some of these balls. That is, we know f to within the approximation 2e, if we 
know precisely how it maps each B; to which ball B¢(y;); this is equivalent to 
an “encoding” i + j fromi = 1,...,1 to 7 = 1,...,m. There are at most m” 
such maps, although not all need be represented by the functions in F. For those 
combinations that are in fact represented by functions in F,, select one from each and 
denote it by gq, k= 1,..., N. 

Going back to f € F, with an encoding i +> j, pick gx with the same encoding. 
Then for any x € K, pick y; close to f(x) (and gx(x)), 


AF (x), 9X) < AF (X), Wj) + A(Yj, eX) < 4 


and taking the supremum over x, we have d(f, gx) < 4€. To summarize, the finite 
number of functions gx are close to within 4€ to any function f € F, so that F is 
totally bounded. Oo 
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Exercises 6.27 


1. 
2. 


10. 


11. 


Show that C(X) contains the closed subset C,(X, R). 


p> Uniform convergence, f, — f in C(X), implies pointwise convergence. 
Plot the functions (i) f (mx) on [0, 1], where f(x) := max(0, x(1 — x), and (ii) 
xt 1/(1+nx) on ]0, oo[; then show they converge pointwise to 0 asn > o, 
but not uniformly. 


. [fra f f and f/(x) > f’(x) need not hold if f, converges to f pointwise. 


Show that x F nx” and 


smn are counterexamples in C(0, 1). 


. * (Dini) If K is compact and f,, € C(K) is an increasing sequence of real-valued 


functions, converging pointwise to f € C(K), then f, > f in C(K). (Hint: 
Cover K by balls Bs(x) inside which f —€ < f, < f forn > Nx.) 


. * The space C[a, b] is separable (using piecewise linear functions with kinks at 


rational numbers), but C (RT) is not. 


. The subspace of polynomials in C[a, b] is not closed (and so is incomplete): con- 


struct a sequence of polynomials that converges to a non-polynomial continuous 
function in C[O, 1]. 


.WsJ:X>Xandl: Yo Y are homeomorphisms then f +> Lo foJ'is 


a homeomorphism between C(X, Y) and C (X “y ). 


. Follow the proof of the Stone-Weierstrass theorem to find a quadratic approxi- 


x O<x<l 


mation to the function f(x) := O- pes eu 


. Suppose f, : K — C are continuous functions on a compact set K , converging 


pointwise to f. By the Arzela-Ascoli theorem, if f,, are equicontinuous and 
uniformly bounded (| f,(x)| < cforallx € K,n € N), then f is also continuous. 


A set of Lipschitz functions f: [a,b] — R (Definition 4.14) with the same 
Lipschitz constant c, | f(x) — f(y)| < clx — y|, form a totally bounded set in 
C[a, b]. The fact that one c works for all, implies that they are equicontinuous; 
and their collective image in R is bounded (|x — y| < |b — a]), hence totally 
bounded. 


Show that the set of functions {sin x, sin 2x,...} and {x, x2, x?,...} on [0, 1] 
are not equicontinuous. 


Part II 
Banach and Hilbert Spaces 


Chapter 7 
Normed Spaces 


7.1 Vector Spaces 


It is assumed that the reader has already encountered vectors and matrices before but 
a brief summary of their theory is provided here for reference purposes. 


Definition 7.1 


A vector space V over a field F is a set on which are defined an operation of 
vector addition + : V7 — V satisfying associativity, commutativity, zero and 
inverse axioms, and an operation of scalar multiplication F x V — V that 
satisfies the respective distributive laws: for every x, y,z € V andA, we F, 


xt+yt+z2=@+y)+z, Mx + y) = Ax 4+ dry, 
xty=y+x, (A + w)x = Ax + px, 
O+x=x, (Ap)x = A(x), 
x + (—x) =0, ibs = xe, 
Ay e 


/ x+y 
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Review 


1. (-l)x = —x, —(—x) = x, Ox = 0, AO = O. There is little danger that the 
zero scalar is confused with the zero vector, so no attempt is made to distinguish 
them. 


2. F is itself a vector space with scalar multiplication being plain multiplication. 
The smallest vector space is {0} (often written as 0). 


3. The product of vector spaces (over the same field), V x W, is a vector space 
with addition and scalar multiplication defined by 


G+) -Gae) 2G)=(8). 


The zero in this case is CG) and the negatives are —(1)) = (—). By extension, 


x 
‘ 
FX :=F x --- x Fis a vector space. 


4. If V is a vector space, then so is the set of functions V4 = { f:A— V} (for 
any set A) with 


(f+ 9@):= fa)+9@), OAf\@) :=Af@). 


The zero of V4 is O(x) := 0, and the negatives are (— f)(x) := —f(x). 


5. A subset of a vector space V which is itself a vector space with respect to the 
inherited vector addition and scalar multiplication is called a linear subspace. 
Since associativity and commutativity are obviously inherited properties, one 
need only check that the non-empty subset is “closed” under vector addition 
and scalar multiplication (then the zero 0 = Ox and inverses —x = (—1)x are 
automatically in the set). There are always the trivial linear subspaces {0} and V. 


6. The intersection of linear subspaces is itself a linear subspace. 


7. An important example of a linear subspace is that generated by a set of vectors 
A] t= {Ajay +--+ + Anan :a4,€ A, €F,neN}, 


with the convention that [@]] := {0}. It is the smallest linear subspace that 
includes A, and we say that A spans, or generates, [[A]]. Each element of [A] 
is said to be a linear combination of the vectors in A. 


8. The set A is linearly independent when any vector a € A is not generated by 
the rest, a ¢ [[A\{a}]]. Un particular A does not contain 0.) This is equivalent 
to saying that \a € [A\{a}]] & A = 0, or that for distinct a; € A, 


n 
yAa=o0 & \;=0,i=1,...,n. 
i=1 


Tek 


10. 


11. 


12. 


13. 


14. 


15, 


16. 


17. 
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A vector generated by a linearly independent set A has unique coefficients ;, 


n n 
x= > Xai = >) pias & = ui,t=1,...,n. 
i=l i=l 


. A basis is a minimal set of generating vectors; it must be linearly independent. 


Conversely, every generating set of linearly independent vectors is a basis. 


A vector space is said to be finite-dimensional when it is generated by a finite 
number of vectors, V = [a1,..., an] C= [{ a1, ..., an }]]). The smallest such 
number of generating vectors is called the dimension of the vector space, denoted 
dim V, and is equal to the number of vectors in a basis. 


For example, F has dimension 1, because it is generated by any non-zero element, 
while dim{ 0} = 0. The linear subspace generated by two linearly independent 
vectors [[x, y]] is 2-dimensional and is called a plane (passing through the origin). 


The space of m x n matrices is a finite-dimensional vector space, generated by 
the mn matrices Ej; consisting of Os everywhere with the exception of a | at 
row i and column j. 


If V is finite-dimensional, then so is any linear subspace W, and dim W < dim V 
(strictly less if it is a proper subspace). 


We write A+ B:={a+b:aeAANDDeE BhandX\A:={da: ae A} for 
any subsets A, B C V,e.g-Q+Q=Q,C =R+ iR. Thus \(A U B) = AAU 
AB, and A(AN B) = AAN AB(A ¥ 0); a non-empty set A is a linear subspace 
when AA+ pA C A for all A, w € F. For brevity, x + A is written instead of 
{x } + A; it is a translation of the set A by the vector x. Care must be taken in 
interpreting these symbols: A — A = {a—b: a,b € A} is not usually {0}. 


For non-empty subsets of R, sup(A + B) < sup A+ sup B, sup(AA) = Asup A 
(A 2 0). 

Proof Leta+be¢eA+B,thena < supA and b < sup B, so sup A + sup B is 
an upper bound of A + B, and hence greater than its least upper bound. 


Similarly, for alla € A,a <supA => Aa < AsupA, so sup(AA) < Asup A. 
Hence, sup A = sup(+AA) < + sup(AA) and equality holds. 


The space V is said to decompose as a direct sum of its subspaces M and N, 
writen V = M @WN, when V = M+WN and MON = O. For example, 
R? = 0(7)1 © U(7)I- 


> For vector spaces over R or C, a subset C is said to be convex when it contains 
the line segment between any two of its points, 


Vx,yeC,0<r<l > tx+Ud-bndyec, 
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equivalent to sC + tC = (s+ f)C for s,t > O. This generalizes easily to 
xX +--+ +t%y € C whent) +---+% = 1,4 > 0, and x; € C. Clearly, linear 
subspaces are convex. 


18. The intersection of convex sets is convex. There is a smallest convex set con- 
taining a set A, called its convex hull, defined as the intersection of all convex 
sets containing A, which equals 


{tyxyp tee  ttyxXy, 2 x; € At SO, +--- +h =1,n€ N}. 


Hausdorff’s Maximality Principle 


The Hausdorff Maximality Principle is a statement that can be used to possibly extend 
arguments that work in the finite or countable case to sets of arbitrary size. There are 
a few proofs in this book that make use of this principle; it is only needed to extend 
results to “uncountably infinite” dimensions. As such, it is mainly of theoretical 
value, and this section can be skipped if the main interest is in applications. 
Consider a collection M of subsets M C X that satisfy a certain property P. A 
chain C = { M,} of such sets is a nested sub-collection, meaning that for any two 
sets My, Mg € C, either My © Mg or Mg C Mg. Achain can contain any number of 
nested subsets, even uncountable. A chain is called maximal when it cannot be added 
to by the insertion of any subset in M. Hausdorff’s maximality principle states that 


Every chain in M is contained in some maximal chain in M. 


Hausdorff’s Maximality principle is often used to show there is a maximal set 
E that satisfies some property P as follows: the empty chain can be extended to a 
maximal chain of sets M,; if itcan be shown that the union of this chain E := J a Ma 
also satisfies P, then there are no sets properly containing E which satisfy P, by the 
maximality of { M, }, i.e., E is a maximal set in M. 

At the end of this chapter, it is proved that Hausdorff’s maximality principle 
implies the Axiom of Choice. Conversely, the Hausdorff Maximality principle can 
be proved from the Axiom of Choice (using the other standard set axioms), i.e., they 
are logically equivalent to each other, as well as to a number of other formulations 
such as Zorn’s lemma and the Well-Ordering principle. These statements are not 
constructive in the sense that they give no explicit way of finding the choice function 
or the maximal chain, but simply assert their existence. 

The purpose in introducing Hausdorff’s Maximality Principle here is to prove: 


Every vector space has a basis. 
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Proof Consider the collection of linearly independent sets of vectors in V. By 
Hausdorff’s maximality principle, there is a maximal chain M of nested linearly 
independent sets Ay. We show that E := |), Aq is linearly independent and spans V, 


hence a basis. If }°"_, Aja; = O fora; € E, then each of the vectors aj (i = 1,...,n) 
belongs to some Aq,, and hence they all belong to some single A, because these sets 
are nested in each other; but as Aq is linearly independent, A; = 0 fori = 1,...,7. 


Thus £ is linearly independent. Suppose E does not span V, meaning there is a 
vector v ¢ [E]], so that E U { v} is linearly independent. As it properly contains E 
and every Ag, it contradicts the maximality of the chain M. Oo 


7.2 Norms 


We would like to consider vector spaces having a metric space structure. Any set can 
be given a metric, so this is quite possible, but it is more interesting to have a metric 
that is related to vector addition and scalar multiplication in a natural way. Taking 
cue from Euclid’s ideas of congruence, the properties that we have in mind are 


(a) translation-invariance, distances between vectors should remain the same when 
they are translated by the same amount, 


d(x +a,y +a) =d(x, y), , \ 
y 


(b) scaling-homogeneity, distances should scale in proportion when vectors are 


scaled, 
Ax 
x 
d(Ax, Ay) = |Ald(x, y). \ 
y Ay 


These properties are valid only for special types of metric. When d is translation 
invariant, then d(x, y) = d(x — y, y— y) = d(x — y, 0) and d becomes essentially 
a function of one variable, namely the norm function ||x|| := d(x, 0) with d(x, y) = 
|x — y||. Conversely, any such d defined this way is translation invariant because 
d(x +a,y+a) = ||x+a-—y-—al|| = d(x, y). This function is then scaling- 
homogeneous precisely when 
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||Ax|| = d(Ax, 0) = |Ald(@, 0) = [AI IIx. 


What properties does a norm need to have, for d to be a distance? It is easy to see 
that 


d(x,z) < d(x, y) + d(y, z) la + ll < llall + lll) 
d(y,x) = d(x, y) s |—al| = llall 
d(x,y) 20 lal] > 0 
d(ix,y)=0 8 x=y la|| =O ea=0 


where a = x — y,b = y — z. Of these, the symmetry property follows from scaling- 
homogeneity, while positivity follows from 0 = ||x — x|| < ||x|] + ||—x]] = 2||x|l. 


Definition 7.2 


A normed space X is a vector space over F = R or C with a function called 
the norm || - || : X — R such that for any x, y € X,A € F, 


lx + yl < lle + ily, Ax = 1Allel, [lel = 0 > x =0. 


If necessary, norms on different spaces are distinguished by a subscript such as 
- ||y- A positive function that satisfies the first two axioms is termed a semi-norm. 


Easy Consequences 

1. |x — yl] 2 Ilxll — lly. 

2. lla, +++ + Xall < [lal] +--+ + lanl] (by induction). 
Examples 7.3 


1. The absolute value functions, |-|, for R and C are themselves norms, making these 
the simplest normed spaces. 


2. » The spaces R™ and C of geometric vectors have a Euclidean norm defined by 


llallz := (S10) 


There are other possibilities, e.g. ||a||; := iy |a;|, or ||a|| 5 := max; |a;|. Thus 


1/2 


(2a), =34+4=7, I), =V9+16=5, II (24)II,, = max(3, 4) = 4. 


The different norms give the different distances already defined in Example 2.2(6). 
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3. » A sequence of vectors ¥, = (din, .-.,@Nn) in FY converges, X¥, — a (in any 
of these norms), precisely when each coefficient converges in F, ajn — a; for 
is Peeeere es 


Proof Using the 2-norm, for any fixed i, 


2 2 2 2 
ldin — ai|“ < lain — ai|7 +--+ + ann — an|> = |1Xn — allo 


so when the latter diminishes to 0, so does the left-hand side. 


Conversely, if aj, — a; fori = 1,..., N, then 


Xn — alln = Vlain — a1|2 +--+ ann — an |? > 0, 


by continuity of the various constituent functions. 
With minor changes, the same proof works for the other norms as well. 


4. More generally, we can define the norm |la||, := (/ oe laj|? for p > 1. 


Shortly, we will see that all these norms are equivalent in finite dimensions, so 
we usually take the most convenient ones, such as p = 1, 2, oo. 


5. pb» Sequences: sequences can be added and multiplied by scalars, and form a 
vector space. 
(ao, a1, ...) + (bo, b1,.-.) = a0 + b0,a1 + b1,...), 
(ao, 1, ---) = (Ado, Ad1,...). 


The zero sequence is (0, 0,...) and —(ao, aj, ...) = (—do, —aq, ...). 


The different norms introduced above generalize to sequences; the three most 
important normed sequence spaces are: 


(a) €!:={(a): pee, |an| < co} with norm defined by 


CO 
@n le = >= lanl. 
n=0 


(b) 07 = { (an) : °° lan|? < co } with norm defined by 


foe) 
I@nlle = | Do lanl? 
n=0 


(c) €% := {(a_) : Ac, |an| < c} with norm defined by ||(an)|| p00 := sup |an|. 
n 
For example, for the sequence (1/n) = (1, 5 7 . ae 


|A/m)lle =00, IA/Mlle =7/V6, |./n)lleo = 1. 
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In each case there are two versions of the spaces, depending on whether a, € R 
or C; the scalar field is then, correspondingly, real or complex. By default, we 
take the complex spaces as standard, unless specified otherwise. 


Note carefully that an implicit assumption is being made here that adding two 
sequences in a space gives another sequence in the same space. This follows from 
the triangle inequality for the respective norm; it is left as an exercise for £' and 
£%, but is proved for ¢7 in the next proposition. See Proposition 9.12 for £?. 


. These spaces are different from each other. Not only do they contain different 


sequences, but convergence is different in each. For example, the sequences 


x; := (,0,0,...) 
x2 := (1/2, 1/2,0,...) 
x3 := (1/3, 1/3, 1/3,0,...) 


are all in ¢!, 2, and ©. They converge x, > 0in 2° asn > oc, 
1 
|Xn|| goo = sup{ Pa =1/n— 0. 
(Show that they also converge to 0 in ¢7.) But they do not converge in ¢!, 


! 1 
elle = ee = et 


n 
Thus, convergence of each coefficient is necessary, but not sufficient, for the 
convergence of Xp. 


. >» Functions A — F, where A is an interval in R, say, also form a vector space, 


with 
(f+9@):= f@)+9@), OA) :=Af), 


and different norms can be defined for them as well (once again, there are two 
versions of each space, depending on whether the functions are real- or complex- 
valued): 


(a) The space LI(A) := {f: A —> C, Ji If @)| dx < oo} with norm 
defined by 


lla = [ 1footak. 


Or rather, this would be a norm, except that || f||;1 = bE Alf (x)| dx = 0 not 
when f = 0 but when f = Oa.e. (Section9.2). The failure of this axiom 
is not drastic, and those functions that are equal almost everywhere can be 
identified into equivalence classes to create a proper normed space, called 
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Lebesgue space (Remark 2.23(1)). But to adopt a special notation for them, 
such as [ f], would be too pedantic to be useful; the symbol f, when used in 
the context of Lebesgue spaces, represents any function in its equivalence 
class. (The same comment holds for the next two spaces.) 


(b) The space L?(A) := {f:A>C: (lf ds < oo}, with norm 


2 
defined by || fl|;2 := ( | Lf woPax) . More generally there are the 
A 
L?(A) spaces for p > 1. 
(c) The space 


L®(A) :={f: A > C:f is measurable AND Jc | f(x)| < c ae.x }, 


with norm defined by || f||;00 := sup, a¢. | f(x)| (e., the smallest c such 
that | f (x)| < c a.e.x). 


(d) The space Cp(X, Y) of bounded continuous functions, defined previously 
(Theorem 6.23), is a normed space when Y is, with 


IF llc = sup || f@)lly- 
xeEX 


(Check that d as defined on C;(X, Y) is translation-invariant and scaling- 
homogeneous.) C,(X) is a linear subspace of L°(X), with the same norm. 
Note that C,(N) = £°. 


For example, on A := [0, 27], || sin || 1 = 4, || sin || ;2 = /7, and || sin ||; = 1. 
More details and proofs for the first three spaces can be found in Section 8.2. 


8. » When X, Y are normed spaces over the same field, X x Y is also a normed 
space, with 


x1 x2\. (x1+x2 x\ (Ax x ue 
(Ga) a3) = (3): I (5) b= tebe +t 


The induced metric is D;, defined previously for X x Y as metric spaces 
(Example 2.2(6)). 


9. » Suppose a vector space has two norms || - || and || - ||. Convergence with respect 
to one norm is the same as convergence with respect to the other norm when they 
are equivalent in the sense of metrics (Exercise 4.17(6)), i.e., there are positive 
constants c,d > 0, 


cllx|] < [lll < dle). 
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Proof Suppose the inequalities hold and 
llxn — xl <a 


as well; similarly if |x, — x|| — 0 then 


Conversely, suppose the ratios |||x |] /||x| 


sequence of vectors x, can be found such that ||x,|| = 1 but |x, < 
|| but not with respect to || - 


X, — O with respect to || - 
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|Xn — x|| — O, then 
|Xn — x|| > 0 


[xn — xl] < eT! lan — xl] > 0. 


approach 0 as x varies in X. Then a 
1/n, 1.e., 
||. For this not to happen, 


ll [I /lll| 2 ¢ > O, and similarly, ||||/[lxl| 2 1/d > 0. 


A good strategy to adopt when tackling 
try to answer it first for R or C, then CY, 
or ¢!, and finally for a function space C[0 


a question about normed spaces, is to 
then for a sequence space such as £%° 


, 1], L™[0, 1], or L'(R). Theoretically, 


sequence spaces are useful as model spaces that are rich enough to exhibit most 
generic properties of normed spaces. But they are also indispensable in practice: a 
real-life function f(t) is discretized, or digitized, into a sequence of numbers fj, 


before it can be manipulated by algorithms. 


Let us justify the claim that ¢? is anormed space, by showing that the standard norm 
(an )lle2 = 4/ >), lan |? satisfies the triangle inequality, even in infinite dimensions. 


Proposition 7.4 Cauchy’s inequality 


For ay, by € C, 


ore) 
| ann < pan > Ibn? 
n=0 \\ 2= n=0 
fore) fore) 
lan + dnl? < lanl? + Si 
n=0 \ n=0 


Proof (i) It is easy to show from (a — b> 
numbers a, b. Hence, 


(Sav) = 


n 
It follows that, for complex numbers dy, bn, 


| Dd anbn < >. lan ||Dnl 
n n 


2 ajbjajb. 


0 that ab < (a? + b*)/2 for any real 


a2 + a3b?)/2= >a? >be. 
i j 


< = jn Zo 
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(ii) 
Dilan + bal? < Do lanl? + lbnl? + 2lanbr!| 
n n 
< DO lanl? + DS bal? Pe a 
n nN n n 


2 


= [Det Ua o 
n n 


Thus for any two real sequences x = (ay), y = (bp) in £*, one can define their 
‘dot product’ 


whose convergence is assured by Cauchy’s inequality. The identity ||x||?7 = x - x, 
familiar for Euclidean spaces, remains valid for *. Note that the two inequalities 
above can be written as |x - y| < ||x||||_y|] and ||x + y]] < ||x|| + || yl], and that x - y 
need not be finite unless x and y are in ¢°. 

Since the metric of a normed space is translation invariant, it is not surprising that 
balls do not change their shape when translated. 


Proposition 7.5 


All balls in a normed space have the same convex shape: 


B,(x) = x +rB,(0), 
B,(x) + Bs(y) = Brats + y), ABr(x) = Bir (Ax). 


Proof The norm axioms can be recast as axioms for the shape of balls. The 
translation-invariance and scaling-homogeneity of the distance are equivalent to 


B(x +a)={y:diy,x+a) <r}={y:dy-—a,x) <r} 
={at+z:d(z,x) <r} =B,(x) +a, 


XB (0) = {Ay = lly] < 1} = {z= Mell < IAL} = By), A #0). 


Combining the two gives B,(a) = a+ rB,(0), showing that all balls have the same 
shape as the ball of radius | centered at the origin. 

The third norm axiom is equivalent to (),.) B-(0) = {0}, while the triangle 
inequality becomes B,.(0) + B;(0) = B,+5(0) since 
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IIx] <r AND |lyll <s => |lx+yll<rts, 


|x| <rt+ts > x= x € B,(0) + B;(0). 


x+ 7 
r+s r+s 


Recasting this equation as (r + 5)B,(0) = r Bi (0) + sB, (0) for r, s > 0 shows that 
B (0), and hence all other balls, are convex: for x, y € By, (0) andO <t < 1, 


(1 —t)x +ty € 0 —1¢)B,(0) + +B, (0) = B, (0). 
In particular, B(x) + Bs(y) = x +rB,(0) + y+ 5B,(0) = B45(x 4+ y), 
AB; (x) = Ax + Ar By (O) = Bir (Ax). go 


The unit ball is often denoted by By := B,(0) and takes a central role as repre- 
sentative of all other balls; it contains all the information about the norm of X. 


Examples 7.6 


1. The boundary of a ball B,(x) is the sphere S,(x) := {y € X : d(x, y) =r}. 
Any point on the sphere has nearby points inside and outside the ball ((1 — e)y 
and (1 + «)y). Thus B-(x) = B-[x] := {ye X:d(x, y) <r}. 


2. * Balls can have quite counter-intuitive properties. For example, consider the 
path of functions f;(x) := 2|x — t| — 1 in C[O, 1], starting from the function 
fo(x) = 2x — | and ending at the function f; = — fo. It lies on the unit sphere 
of C[O, 1], but has a total length equal to the distance between fo and /}, 


1 1 
d 
length = _ dt = 2dt = 2, 
eng | fell i 0 
distance = || fi — follcto,1y = 2II follcqo,1) = 2. 


Exercises 7.7 


1. Prove that || - ||; and || - ||,, are norms. Which axiom does || - ||, fail when p < 1? 


lp 
2. What do the unit balls of R? in each norm of Example 7.3(2) look like? 


3. The sequence (1, 1,..., 1,0, 0,...) 1s not a good approximation to the constant 
sequence (1, 1,...) in 2; but (1 —«, 1 —«,...) is. 


4. The norm axioms for £! and ¢© are, when interpreted correctly, 
Dy lan + bnl < y lan | + > lbnl, SUP, lan + bal < sup, la@n| + sup, lbnl, 
Din Adal = IAL, lanl sup, |Adn| = |Al sup, [dnl 
Dy |a@n| = 0 S Vn, =0 sup, |dn| =0 > Vn, an = 0. 


Prove these, assuming any results about series (Section 7.5). What are they for (7? 
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5. Aset Ais bounded when there isac > 0 such that Vx € A, ||x|| < c (Section6.1). 
A non-trivial normed space is not bounded. 


6. For any subset A, and e€ > 0, A + B,(0) is an open set containing A. 


7. » The norms || - |I;, || - |] and || - |], are all equivalent on R% since (prove!) 
IX Iloo S [l¥ll2 < lle ll < Nl lloo- 


But they are not equivalent for sequences or functions! Find sequences of func- 
tions that converge in L'[0, 1] but not in L©[0, 1], or vice-versa. Can sequences 
converge in £! but not in 2°? 


8. * Minkowski semi-norm: Let C be a convex set which is balanced, e§C=aC 
(V@ € R), and such that U9 rC = X. Then 


|x|] :=inf{r >O0:xerc} 


is asemi-norm on X. 


7.3 Metric and Vector Properties 


By construction, normed spaces are metric spaces, as well as vector spaces. We 
can apply ideas related to both, in particular open/closed sets, convergence, com- 
pleteness, continuity, connectedness, and compactness, as well as linear subspaces, 
linear independence and spanning sets, convexity, linear transformations, etc. Many 
of these notions have better characterizations in normed spaces, as the following 
propositions attest. 


Proposition 7.8 
Vector addition, (ye xt+y, Xe Xe 
scalar multiplication, (A, x) Ax, Fx X > X, 
and the norm xt |lx|l, xX—>R, 


are continuous. 


Proof Vector addition and the norm are in fact Lipschitz maps, 


I|(v1 + y1) — G2 + yadIl < [lea — x2] + Ilyn — yall = G1, yi) — Ga, ya) Il xe, 
[lll — Wy il| < x — yl. 


Scalar multiplication is continuous: for any € > 0, take |\ — ju| to be smaller than 
min(€/3(1 + ||x||), 1) and ||x — yl] < min(e/3(1 + |A)), 1), to get 
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| Ax — pyll < |]|Ax — pexl| + Ilex — pyll 
= |A = plllxll + [elle — yl 
< |A = pelle] + [Alla — yll + JA = elle — yl 
oe Oo 


Corollary 7.9 


When (x,,) and (y,) converge, 


lim (%, + y,) = lim x, + lim y,, 
n—->oo n—->co n—->co 
lim Ax, = A lim Xp. 
n—-> co n—->Coo 


lim |[xnl| = || lim xnl). 
noo noo 


Of particular importance are closed linear subspaces, because they are “closed” not 
only with respect to the algebraic operations of addition + and scalar multiplication 
+, but also with respect to convergence >. 


Proposition 7.10 


If / is a linear subspace of X, then so is M. 


[A] is the smallest closed linear space containing A. 


Proof (i) Let x, y € M, with sequences x, € M, y, € M converging to them, 
Xn — x and y, > y (Proposition 3.4). But x, + y, and Ax, both belong to M, so 


x+y= lim x,+ lim y, = lim (4%) + yn) € M, 
noo n—-oo n—-> oo 
Ax =X lim x, = lim (axn,) € M. 
n> oo 


n— oo 


Thus M is closed under vector addition and scalar multiplication. In particular this 
holds when M is generated by A. 


(ii) [[A]] is the smallest linear subspace containing A, and [A] is the smallest closed 
set containing [[A]]. So any closed linear subspace containing A must also contain 
[A], and its closure [A]. oO 
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Examples 7.11 

1. The following sets are closed linear subspaces of their respective spaces: 
(a) A:= {(aj) €£! : P24; = 0}, 
(b) B:={f € Cla, b]: f@ = fO)}, 


The proofs for closure (linearity is left as an exercise) depend on the following 
inequalities that hold when a, — a in g}, a, € A, and f, — f in C(O, 1], 
fn € B, 


CO Co [o.@) 
| > dj =| > Gin + > (aj — Gin) 
i=0 i=0 i=0 


f(a) = lim, fu(a) = lim, fulb) = FO) 


ore 
< la — Gin| = |la — ayl|le > 0 
i=0 


2. *If M and WN are closed subsets of a normed space, M + N need not be closed 
(see also Exercise 7.14(5)). 


(i) Let f: X — Y be a continuous function between normed spaces; let 
= {(x, f(x)) : x © X}, N := {(x,0) : x © X}; they are closed sub- 

sets of X x Y (prove!). But M+ N = { (x, f(x)) : x, x € X} 1s closed if, and 

only if, im f is closed, which need not be the case. To take a specific example, 


{(x,0):x €R}4+{(x,e*): x €R}=R x ]0, of. 


(ii) This is true even if MW, N are linear subspaces. Let M be the set of bounded 
sequences (a), 0,a2,0,...) whose even terms vanish, and let N consist of 
bounded sequences of the type (a), a1 /1, a2, a/2’, a3, a3/3°, ...). They are both 
closed subspaces of £° (check!). Now consider 


a oe ee ae n,~,0,0 JEN 
—_ ge 7 4?° n, 9 9 Va eee 
= 40.2, 0,3,0,4.0,. 7 0,0,0, JEM 
1 1 
= (0,1,0,-,0, 5,0, 5% « ,-,0,0,.... 6M 
n (0, 10,50, 350, gs ,0 7 0 0 ye + N 


Xn — y, converges to the bounded sequence (0, 1, 0, 7 ...) which cannot be 
expressed as a vector in M + N. 


Connected and Compact Subsets 
Recall that connected sets may be complicated objects in general metric spaces. This 


is still true in normed spaces, but at least for open subsets, connectedness reduces to 
path-connectedness, which is more intuitive and usually easier to prove. 
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Proposition 7.12 


An open connected set in a normed space is path-connected. 


Proof Let C be a non-empty open connected set in X. Recall that “path-connected” 
means that any two points in C can be joined by a continuous path r : [0, 1] > C 
starting at one point and ending at the other. Fix any x € C, and let P be the subset 
of C consisting of those points that are path-connected to x. We wish to show that 
P=C. 

P has no boundary in C: Given any boundary point 
z of P, there is a ball B.(z) C C since C is open, and P 
thus a point y € P in the ball. This means that there is y w 
a path r from x to y. In normed spaces, it is obvious 
that balls, like all convex sets, are path-connected (by 
straight paths). So we can extend the path r to one 
that starts from x and ends at any other w € B,(z), 
simply by adjoining the straight line at the end. More 
rigorously, the function 7 : [0, 1] + C defined by 


1 
r= r(2t) te os 7] 
y+ 2t—I(w—y) te]y, 1] 


a 


is continuous. So z is surrounded by points of P, a contradiction. 

But a connected set such as C, cannot contain a subset, such as P, without 
a boundary (Proposition 5.3), unless P = © (which is not the case here) or 
P=C. oO 


There is quite a bit to say about bounded and totally bounded sets. As we will 
see later on, they are the same in finite dimensional normed spaces, but in infinite 
dimensional ones, no open set can be totally bounded, although balls are bounded sets. 
For now, let us show that translations and scalings of bounded and totally bounded 
sets remain so. 


Proposition 7.13 


If A, B are both bounded, totally bounded, or compact sets, then so are, 
respectively, \A and A + B. 


Proof Proposition 7.5 is used throughout the following. 
Boundedness: If A C B,(x) and B C B;(y), then 


AA © AB; (x) = Bir (Ax), 
A+ BC B,(x) + Bs(y) = Bras(x + y). 
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Total boundedness: 


N N 
AA CX U Be/\\| (Xi) = U B.(Axi), 


i=1 i=] 
N M 


A+BC U Bej2(xi) + 0 Bej2(yj) = U BAxi + yj). 
i=l j=l i,j 


Compactness: If A is compact, then scalar multiplication, being continuous, sends 
it to the compact set AA (Proposition 6.15). If B is also compact, then A x B is 
compact (Exercise 6.22(16)), and vector addition, being continuous, maps it to the 
compact set A + B. Oo 
Exercises 7.14 
1. Show that the following sets are closed subspaces of their respective spaces: 

(a) {(aj) € €° : ap = 0}, 

(b) {(a;) € 2? : a; =a43 AND ay = Dp, aj/i}, 
1 

(c) {f €C[0, 11: fy f =0}. 


2. The set of polynomials in x forms a linear subspace of C[0, 1]. Its dimension is 
infinite because the elements 1, x, x”,... are linearly independent. Is it closed, 
or if not, what could be the closure of the polynomials in this space? 


3. The convex hull of a closed set need not be closed; a counterexample is given by, 
(R x {0}) U{(O, 1) }. But the closure of a convex set C is convex. 


4. Line segments are path-connected; so linear subspaces and convex subsets (such 
as balls) are connected. 


5. The continuity of + and \- imply that \A = \A and A+ B C A+B. Find an 
example to show that equality need not necessarily hold. 


7.4 Complete and Separable Normed Vector Spaces 
Definition 7.15 


When the induced metric d(x, y) := ||x — y|| is complete, the normed space 
is called a Banach space. 
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Stefan Banach (1892-1945) After WW1, at 24 years, a chance 
event led him to meet Steinhaus, who had studied under Hilbert 
in 1911, and was then at Krakow university. His 1920 the- 
sis on abstract normed real vector spaces earned him a post 
at the University of Lwow; working mostly in the “Scottish 
café”, he continued research on “linear operations”, where he 
introduced weak convergence and proved various theorems such 
as the Hahn-Banach, Banach-Steinhaus, Banach-Alaoglu, his 
fixed-point theorem, and the Banach-Tarski paradox. 


Fig. 7.1 Banach 


Examples 7.16 


1: 


> IR and CN are separable Banach spaces. It is later shown (Section 9.1.4) that 
the sequence spaces ¢? and the Lebesgue function spaces L?[0,1] (1 < p < 
co) are also separable Banach spaces, but £°° is a non-separable Banach space 
(Theorem 9.1). 


A closed linear subspace of a Banach space is itself a Banach space 
(Proposition 4.7). 


p> When X, Y are Banach spaces over the same field, so is X x Y (Proposition 4.7). 


Cp(X, Y) is a Banach space whenever Y is (Theorem 6.23). 


. Not every normed space is complete (when infinite dimensional). 


(i) The set coo of finite sequences (ag,...,4n,0,0,...), n € N, is an 
incomplete linear subspace of €°. For example, the vectors (1,0,0,...), 
di, 7 0,0,...),..., CL, 5 esis i 0,0,...), ..., form a Cauchy sequence which 


does not converge in coo. 


(ii) Take the vector space of continuous functions C[—1, 1] with the 1-norm 
If || = ie | f (x)| dx. This is indeed a norm but it is not complete on that space. 
For consider the sequence of continuous functions defined by 


0 -l<x<0O 
fa(x) = 4nx O<x<l/n. 
1 I/n<x<l 


It is Cauchy: 


1 
1,1 1 
Ifa fall = f (= fal= | > 0, asn,m— oo 
-1 


but were it to converge to some f € C[—1, 1], ie., a | fn(x) — f(x)| dx > 0, 
then 
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0 1 
J iteoide=0= | [= fodide, 
=1 1/n 


so that f(x) = 0 on[—1, O[ and f(x) = 1 on JO, 1], implying it is discontinuous. 
Similarly the set C[a, b] is not closed as a linear subspace of L?{a, b]. 


Proposition 7.17 


Every normed space can be completed to a Banach space. 


Proof Let X be the completion of the normed space X (Theorem 4.6). We need to 
prove that vector addition, scalar multiplication and the norm on X can be extended 
to X. Using the notation of Theorem 4.6, let x = [x], y = [yn] be elements of xX, 
with (x7), (vn) Cauchy sequences in X. Since 


Xn + Yn — Xm — Ymll < len — Xml| + lyn — Yn|| > 9 
|Axn — AXm|| = |Alllan — xm|| > 0 


[Nanll — llmll] < [ln — Xml > 0, 


as n,m —> oo, we find that (x, + yn), (Ax) and (||x,||) are all Cauchy sequences. 
For the same reasons, if (x/,) is asymptotic to (x,), and (y;,) to (y,), then (xj, + y/,) 
and (Xn + yn), (Ax;,) and (Axn), and ||x;, || and ||x,||, are asymptotic to each other, 
respectively. So we can define 


xX + y:= [Xn + yal, Ax = [Axn], [lx] = im, Xn Il. 


Note that d(x, y) = ||x — y]|. It is easy to check that they give a legitimate vector 
addition, scalar multiplication and a norm; the required axioms follow from the same 
properties in X and the continuity of these operations, e.g. 


|x + yl] = lim [xn + yall < lim ([lxnll + Ilya lD = llell + lly, 
n—- oo now 


|x|] =O => [xn] > 0 > x = [xn] = [0] = 0. 


Note that the zero can be represented by the Cauchy sequence (0), and —x by 
(—xn). Furthermore, recall that there is a copy of X in X (as constant sequences); 
the operations just defined on X reduce to the given operations on X, when restricted 
to it. Oo 


Proposition 7.18 


A normed space X is separable if, and only if, there is a countable subset 
A such that X = [A]. 
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Proof If X = A, such as when X is separable, then X = A CTA] CX. 
Conversely, suppose X = [|A]] with A countable; this means that for any vector 
x, there is a linear combination of a; € A (a; 4 0), such that 


|Aray +--+: + Anan —x\|<e€ Aj €RorC. (7.1) 


[[A]] is not countable (unless A C {0}), but the set of (finite) linear combinations of 
vectors in A using coefficients from Q + iQ is countable (why? hint: U,,(Q°)" is 
countable). Choosing rj = pj + igi € V+ iQ, such that |r; — Aj] < €/n|la;\|, and 
combining with (7.1), we get 


Iriay +++ + rndn — x\) < [ri — Aa|flanll| +--+ + Irn — An [lanl] + € < 2e. 


This shows that X is separable. Oo 


7.5 Series 


Sequences and convergence play a big role in metric spaces. Normed spaces allow 
sequences to be combined with summation, thereby obtaining series x) + ---+ Xp. 


Definition 7.19 


Aseries >”, Xn is asequence of vectors in a normed space obtained by addition, 


(x1,.%1 + X2, x1 + x2 + x3,...); the Nth term of the sequence is denoted by 
N N 


> Xn. Therefore, a series converges when |x - > Tia | — Oforsome x € X 


n—Il nll 
as N — oo; in this case the limit x is called its sum 


A series is said to converge absolutely when >”, ||x;|| converges in R. 


Examples 7.20 
1. We can convert some results about convergence of sequences to series: 


(a) >, nm + Yn) = >), Xn + >, Yn When the latter converge; similarly, 
Yn Mn = AD Kn. 


(b) A series is Cauchy when x, +--:+Xm — Oasn,m— oo. 


7.5 Series 109 


ee) 


CO 
2. If aseries converges both normally and absolutely, then | > Xn | < ss \|Xn I. 


n=1 n=1 


Proof Take the limit of ||x1 +---+ Xn] < |lx1|] +---+ llxn|lasn > oc. 


3. There are series that converge but not absolutely. As an example, take any decreas- 

ing sequence of positive real numbers a, — 0, then >”, (—1)"a,, converges in R 
(Leibniz); yet >”, dn may diverge. 
Indeed, when = dy, = c© and0O < a, — O, the series ~, sta, can converge to 
any a € R by a judicious choice of signs. Take enough terms a, to just exceed a, 
then reverse sign to lower the sum to just less than a, then reverse sign again and 
continue. 


4. A rearrangement of a series need not converge; even if it does, it need not have 
the same sum. For example, 


I fH bbe baba d a bbe tog, 
I-}4elehe palate soo, 
1} pt baba f ad hte foe, 
I-}adag-be pega bee 1 


5. The sum of a ‘sequence’ (%7)ncz can also be given a meaning: 
(oe) CO oe) 
: Xn = > Xn = » Xn t+ Y Xn; 
neZ n=—0o n=l n=0 


when the latter two series converge. 


In general, absolute convergence does not imply, nor is it implied by, convergence 
of ~h X,. But for Banach spaces, one implication holds: 


Proposition 7.21 


A normed space X is complete if, and only if, any absolutely convergent 
series in X converges. 


Proof Let X be a Banach space, and suppose that >°,, ||xn|| converges. Let yy := 
yy Xn, so that for M > N 


M M 
llyw—yvll=|] > xl < >o Ill >0 asN,M = ov. 
n=N+1 n=N+1 


Hence (yy) is a Cauchy sequence in the complete space X, and so converges. 
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Conversely, let X be a normed space for which every absolutely convergent series 
converges. Let (x,,) be a Cauchy sequence in X, so that forn, m > N, large enough, 
|Xn — Xm|| < €. Letting € := 1/2’,r = 1,2,..., we can find ever larger numbers n, 
such that ||Xn,,, — Xn, || < 1/2”. Thus, 


2 
1 


4 
f= 


(oe) 
D> lear = An, ll < 
ral 


By assumption, since its absolute series converges, so does >°,.(Xn,4; — Xn,)» Le., 
Xn, — Xn, = (Xnj = Xn) ae (Xno = Xn3) a (Xnyp41 = Xn,) 


converges as r — oo. This forces the subsequence x, to converge, and so must the 
parent Cauchy sequence (x,,) (Proposition 4.2). oO 


Series can be used to extend the idea of a basis as follows: a fixed list of unit vectors 
én 1s called a Schauder basis when for any x € X there are unique coefficients a, 
such that 


(oe) 
x= > Qnen- 
n=1 


This implies that X = [[e1, e2,... ]], and by necessity X must be separable (though 
not every separable space has a Schauder basis [41]). Since a vector x = 3°, Qnén 
is identified by its sequence of coefficients (q,,) with respect to a Schauder basis, the 
space X is essentially a sequence space (with norm ||(an)|| := || >°,, @nén|l y). There 
are cases where a permutation of a Schauder basis does not remain a basis; if it does, 
the basis is termed unconditional; again, not every space has an unconditional basis 
(e.g. L'(R) and C[0, 1]). 


Convergence Tests 


Real series are easier to handle than series of vectors, and a number of tests for 
absolute convergence have been devised: 

Comparison Test. If ||xn|| < ||yn|| then 3¥ 9 |xnll < Soo Il_ynl|. If the lat- 
ter converges to ye lyn ||, then >”, Ilxn|] is increasing and bounded above, so 
converges. 

An important special case is comparison with the geometric series, ||x,|| <r 
with r < 1, because l +r+r?2+---= 1/(1 —1r). This leads to: 


Root Test. Let r := lim sup,, ||xn|| I/n, 


n 
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Juliusz Schauder (1899-1943), after fighting in WWI, gradu- 
ated at 24 years from the University of Lwow under Steinhaus 
with a dissertation on statistics. He continued researching in 
the Banach/Steinhaus school, giving the theory of compact op- 
erators its modern shape; he proved that the adjoint of a com- 
pact operator is compact, the Schauder fixed point theorem, 
and generalized aspects of orthonormal bases to Banach spaces; 
later he specialized to partial differential equations. Along with 
many other Polish academics, he was killed by the Nazis during 


WWI. 


| 


Fig. 7.2. Schauder 


(a) ifr < 1 then the series >’, xn is absolutely convergent, 
(b) if r = 1 then the series may or may not converge, 


(c) ifr > 1 then the series diverges. 


Proof (a) ||xn|| < (7 +6)” except for finitely many terms. Since the right-hand side is 
a convergent geometric series when r < | and € is taken small enough, the left-hand 
side series also converges by comparison. 


(b) The series >|, 1 = co and >, 4 < 2 both have r = 1. 


nn n2 
(c) When r > 1, ||x,|| > (+ 6)” > 1 for infinitely many terms, so the series 
>, [lxn|| cannot possibly converge. 


Ratio Test. (D’ Alembert’s) If the ratios |[%n41||/||Xn|| > r then ||x,||!/" > r; it 
is often easier to find the first limit, if it exists, than the second. 


Proof The idea is that for large n, ||xnl| © rllxn—1ll © r”|xoll, so [lxnl|!/" © r. 
More precisely, for n > N large enough, 


r—€ <|[xnll/llan-ill <r +e, 
(r — 6)” xn ll < lxnll < +6" [lanl 
f— det all!" Se te 


since (r + 6)~N/"||xy||!/"7 > 1. 
Cauchy’s Test. If ||x,,|| is decreasing, then >”, ||xn|| converges & >*,, 2”||x2"|| 


converges. 


Proof Let rp := ||Xn||; the test follows from two comparisons, 


rebate + rontiiy Spt 2 £73) Fe (ran heb rg t_y) 

ry t+ 2rg +--+ +2" ron. 

ry + 2rg +2(r3 +14) +--+ + 2-14) + +++ +72") 
2(r1 +12 +--+ +1"). 


ry + 2ro +4rg +--+ 4+ 2”? ron 


IN. IX’ IN 
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Kummer’s Test. Let >", - be a divergent series of positive terms and 
n 


rns Went 4 ayn 
= er 


Tn Xl ln 


If ~ > O, then the series >”, x, converges absolutely, otherwise when a < 0 the 
series diverges. For example, r, := | gives the ratio test, 7, := n is Gauss’s or 
Raabe’s test, and r, := n logn is Bertrand’s test. 


Proof When a > 0, we are given that cl|xy||_ < rallxnl| — m41||xn+1|| forn > N 
large enough, and some 0 < c < a. Summing up these inequalities results in 


ce(law ll +--+ + llamlD <r llan ll — tm+illamsill < rv lew ll 


so the series converges as it is increasing but bounded above. 
When @ < 0, we have rp ||Xn|| < 7n41||Xn+1|| forn = N large enough. Hence 


rn lxn ll 
{Xn || > ——— 


n 
and the series diverges by comparison with the series >”, 2. 
n 


There are other tests, for example, Cauchy’s inequality shows that >°,, dnby con- 
verges when >”, a? and >”, b? do. 


Exercises 7.22 


1. If aseries ~~, Xn converges, then x, — 0. The converse is false: 


1 1 1 1 
bea Sa Se > Oe, 


2 3 4 n 
More generally, for any fixedk, xy+xn4it-:-+xn+k — Oand > ok Xn > 0, 
as N + oo. 
2. If >, llxam — Xnl| > 0 as m —> oo, then tim >) xn = > lim, nm = 
n n 


ee, Xn, if the latter converges. 


n 


3. From the geometric series, it follows that 1 — a +a?—a? +--+ and >), a™ 
(rn > n) converge for |a| < 1inR. 


4. The series 1 + x + a Shae? 5 De ara all De z converge by comparison with 
a geometric series (or using the ratio test). 


1 2 
5. 1+ 2 + 3 fees ri This series was too hard to sum before Euler; show 
: : : 1 1 1 1 
at least that it converges, using the comparison -5 < > Go) Gat 


75 


10. 


11. 


12. 


13. 


Series 113 


Generalize this to the case je 7 rs - i2 po to show that >”, a converges 


for p > 1. Deduce that 5°, atae converges, by comparison. 


These last series are examples that converge slower than the geometric series; in 
fact they are not decided by the root and ratio tests. Are there series that converge 
even slower? 


The Cauchy or Raabe tests can also be used to show that 1 + sp + sh +--+ con- 


verges only when p > 1. Show further that >", 4, >, = logn? on wlognlogloan ; 
..., diverge. 


The Weierstraf M-test (comparison test for L®): if || fill zoo < Mn where >*,, Mn 
converges, then >", fr converges in L(A) (ie., uniformly). Use it to show that 


the function f(x) := y Sra 1 x converges uniformly on [—1, 1]. 


Let fn(x) = e7"*/n, then || frllztjoy < 1/n*, and so >, fn converges in 
L[0, 1). 


What is wrong with this argument: When ||x,||!/" > 1, then ||x,|| > (1 — &)” 
for infinitely many terms; the right-hand side sums to 1/e, which is arbitrarily 
large; hence the series cannot converge absolutely. 


A rearrangement of an absolutely convergent series also converges, to the same 
sum. (Hint: Eventually, the rearranged series will contain the first NV terms.) 


Suppose a series x; + x2 +--+ is split up into two subseries, say x; + x4+--- 
and x2 + x3 +--+, denoted by >7; Xn; and >’; x, . If they both converge, to 
Y J 


x and y respectively, then the original series >", x, also converges, to x + y. 

If one converges, and the other diverges, then the series >°,, x, diverges. But it 

is possible for two subseries to diverge, yet the original series to converge; for 
1 


example, 1 — 5 + }— 4 +--- > log2. 


Cesdro limit: A sequence (x,) is said to converge in the sense of Cesdro when 
a. converges. Show that if a = limy-+oo Xn exists then the Cesdro limit is 
also a. Show that the divergent sequence (—1)” is Cesaro convergent to 0. 


Remarks 7.23 


1. 


Weighted spaces are defined similarly to @? and L? but with a different mea- 
sure or weight. For example, an oe space with weights w, > 0 consists of 
sequences with bounded norms ||x Ilet, = >), tn|Wn. Similarly, ie (A) has norm 


( F( | f(x) |? w(x) dx) 5 . In fact, weighted spaces are isomorphic to the unweighted 
spaces; for example he ~ ¢! via the map (Xn) > (WyXn). 


The second norm axiom requires that the field be normed. A famous theorem by 
Frobenius states that the only normed fields over the reals are R and C. 
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3. Cauchy’s inequality was known to Lagrange in the form 


N 


N N 
> 2 > 2 

ay Din ~ ( 
n=1 m=1 n 


4. Hausdorff’s Maximality Principle = Axiom of Choice. 


N 
2 
anbn) = >, >YGnbn = On bay 
1 


n=|m>n 


Proof Let A= {Aq C X : a € 1} beacollection of non-empty subsets of a set X. 
Consider pairs (J, g) where J is a subset of J and g is an associated choice function 
g: J > X,ie., g(a) € Ag for all a € J. To prove the axiom of choice we need to 
show that there is a choice function f with domain J. 


Let (J, g) © J, g) mean that J C J and g extends g, i.e., g(@) = g(a) whenever 
a € J. By Hausdorff’s maximality principle, there is a maximal chain of nested sets 
and their choice functions (J;, g;). The union J := L; J; also has a choice function, 
namely f(a) := gj(a@) whenever a € Jj, and it is the one sought for: 


f is well-defined: If a belongs to more than one index set, say J; and J;, then 
without loss of generality, J; © J; and g; extends g;, say, so gj (a) = gj(@). 
f isachoice functionon J: Ifa € J thena € Jj forsomei,so f(a) = gj(a)€ Aa. 
J = I: Otherwise there is some index 3 € J\ J, and an element xg € Ag; f can 
Ps ee eJ es 
be extended further to f defined by f(a) := ee . Then f is a choice 
( a= 


function on its domain J U {3}, and extends every choice function g;: J; — X in 
the maximal chain, a contradiction. oO 


Chapter 8 
Continuous Linear Maps 


8.1 Operators 


In every branch of mathematics which concerns itself with sets having some partic- 
ular structure, the functions which preserve that structure, called morphisms, feature 
prominently. Such maps allow us to transfer equations from one space to another, to 
compare them with each other and state when two spaces are essentially the same, 
or if not, whether one can be embedded in the other, etc. Even in applications, it is 
often the case that certain aspects of a process are conserved. For example, a rotation 
of geometric space yields essentially the same space. The morphisms on normed 
spaces are formalized by the following definition. 


Definition 8.1 


An operator! is a continuous linear transformation T : X — Y between 
normed spaces (over the same field), that is, it preserves vector addition, scalar 
multiplication, and convergence, 


Taty)=Tx+Ty, T(x) =ATx, T( lim x,) = lim Tx,. 
N—-> Oo N—- Oo 


A functional is a continuous linear map ¢ : X — F from a normed space to 
its field. The set of operators from X to Y is denoted by B(X, Y), and the set 
of functionals, denoted by X%, is called the dual space of X. 


' The use of the term operator is not standardized: it may simply mean a linear transformation, 
or even just a function, especially outside Functional Analysis. But it is standard to write Tx 
instead of T(x). 
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Easy Consequences 


1. TO=0. 
2 TO hiking oal sae 
3. A linear map is determined by the values it takes on the unit sphere. 


A simple test for continuity of a linear transformation is the following Lipschitz 
or “bounded” property, 


Proposition 8.2 


A linear transformation T : X — Y is continuous if, and only if, T is a 
Lipschitz map 


de>0, Vx € X, ||Txlly <cllxlly. 


Proof The definition of a Lipschitz map reads, when applied for normed spaces, 
I f@) -— fO)|| < ellx — y|| for some c > 0. When f is in fact a linear map 7, it 
becomes ||T (x — y)|| < cllx — yll, or equivalently, ||Ta|| < clla|| foralla ¢ X. That 
Lipschitz maps are (uniformly) continuous is true in every metric space (Examples 
4.15(3)), but can easily be seen in this context. If x, — x, then Tx, — Tx, since 


| xn — Tx] = |T@n — x)Il < cllan — x|| > 9. 


Conversely, suppose the ratios || T x || /||x || are unbounded. Since scaling x does not 


affect this ratio (because T is linear), there must be vectors x, such that ||T x, || = 1 
but ||xn|| < 1/n. So x, > 0 yet Tx, 4 0, and T is not continuous. oO 
Proposition 8.3 


If T : X — Y is an operator, 


(i) the image of a linear subspace A of X is a linear subspace TA := 
{Tx:xEA}ofY, 

(ii) the pre-image of a closed linear subspace B of Y is a closed linear 
subspace 7~!B := {x € X: Tx € B} of X. 


The image and pre-image of convex subsets are convex. 


In particular, its image imT := TX is a linear subspace; and its kernel 
ker T := T~!0 is a closed linear subspace. 
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Proof (i) Let Tx,Ty ¢€ TA, then Tx + Ty = T(x + y) € TA, and 
ATx =T(Ax) ETA. 


(ii) Let x, y, x» € T~'B, that is, Tx, Ty, Txn € B, and let A € F. Then 
Tix+y)=Tx+TyeB, T(Ax) =ATx € B, 


Xn a>a> Ta= T( lim xn) = jim, Tx, € B, 


show that T~! B is a closed linear subspace. 


(iii) Let Tx, Ty € TA, where A C X is a convex subset. Then for any 0 < ¢ < 1, 
zi=tx+ (1 —f)y isin A, so 


tTx+UA-tTy=T(tx+U-t)y)=TzeETA 


shows TA is also convex. Now let B C Y be convex, and let x, y € T—'B, ie., 
a := Tx, b:= Ty are both in B. Then, by convexity of B, 


T(itx+U-t)y)=ta+U-tbesB 


and tx + (1 —t)y € T~'B as required. Oo 


Examples 8.4 
1. An operator T maps the linear subspace [[A]] to [7 A]] because 


n n 
x= > ajaqj => Tx= > aj Taj. 
i=l i=1 


In particular it maps a straight line to another straight line (or to the origin), 
hence the name “linear” applied to operators. 


2. » A linear transformation from C% to C™ takes the form of a matrix. Letting 


a) 
CN = ffe),...,en], C” = Lepses$ ey ee = yy aie: = > |, and 
an 
Te; = Dy Tye’; then 
N N M Ti, ... Tin (oa 
Tx = Vaile = a Tjie’; = 
i=l i=l j=l Tu .-- Tun an 


Every matrix is continuous, 


N 


M M 
I7xll2< DO 1T > lai < | SOIT jl | Nilxll, Exercise 7.7(7)). 
j=l i=l j=l 
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3. A functional from C% to F is then a 1 x N matrix, otherwise known as a row 


vector, 
a N N N a 
og ‘ =¢ DY anen = >) b(€n)an = >> bndn = (b1 ... bv) 
an n=} a=l Va) an 


4, Generalizing this to functionals on complex sequences, let y'(x) := y-x := 
ae bndn, where x = (ay) and y = (by). Then y" is linear, 


y- (x +x’) => bn (an + 4},) = bnan + Do dna =yxty-x’, 
n n n 
y:-Qx) = onan =>" bad =Ay-x, 
n n 


but may or may not be continuous, depending on y and the normed spaces 
involved. For example, to show that (a,) := pra ate defined on 2 is 
continuous, note 


ext =| — ay 


5. When X has a Schauder basis (e,,), a functional must be in the form of a series: 


1 
< Do ga SUP lan < 2Il-rlleo. 
n 


ox = (>) anen) = Di angen = bats (bn := ben, An € F). 


n 


6. The identity operator I: X — X,x + x, 1s trivially linear and continuous. 
Similarly for scalar multiplication, A: x > Ax. 


7. » The left-shift operator L : ¢' > €! defined by (an) > (dn41), ie., 
L(ao, 41, 42, ..-) = (41, 42, 43, ...), 


is onto, linear, continuous, and satisfies || Lx || < ||x ||; its kernel is spanned by eo. 


Proof That L is onto is obvious; linearity and continuity follow from 


L(a@n + bn) = (41 + bi, a2 + b,...) = (G1, a2, ...) + (bi, b2, ...) 
= L(an) + L(bn), 
L(Aan) — (Aa, Aa2, ete. ) = A(a, a2,.. ) od AL (an), 


[o,e) [o,e) 


Lxllen = > lanl < So lanl = Illes 
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For x ¢€ ker L, (aj, a2,...) = Lx = 0,so x = (ao, 0,0,...) = ageg; in fact 
Leo = 0. 


. >» In general, the multiplication of sequences x +> yx, defined by (by) (an) := 
(byan), is linear on the vector space of sequences. When |b,| < c, itis continuous 
asamap £? > €? (p > 1); e.g. for p = 1, 


CO lo) 
lye = 2 bnanl <¢ > lanl = cllalla. 
n=0 


n=0 


In finite dimensions, this is equivalent to multiplying x by a diagonal matrix. 


. Solving linear equations Tx = b, where T and b are given, is probably the single 
mostuseful application inthe whole of mathematics. The complete set of solutions 
isxo + ker T,wherexoisany individual orparticular solutionT x9 = b,andker T 
isthe set of solutions of the homogeneous equation T x = O(sinceT (x — x9) = 0). 


10. The kernel subspace of a functional ker @ is called a hyperplane. 


Integral Operators 


We now consider a broad class of operators that act on spaces of functions. An 


integral operator (or transform) is a mapping on functions 


Thy) := | k(x, FO) de, 


where k is called the kernel of T (not to be confused 
with ker T). To motivate this definition, suppose T 
is a linear operator that inputs a function f: A C 
R — C and outputs a function g: B C R > C. 
If A and B are partitioned into small subintervals, 
the functions f and g are discretized into vectors 
(f;) and (g;), and the linear operator T becomes 
approximately some matrix [7;;]. As the partitions 
are refined, one might hope that 7;; would converge 
to some function k(x, y) on A x B, and the finite 
sums involved in the matrix multiplication 2 atts 
become integrals f Ak(x, y) f(x) dx. (This is not 
necessarily the case, as the identity map attests.) 
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Proposition 8.5 


An integral operator 7 f(y) := sf Ak(x, y) f (x) dx is linear, and is continu- 
ous on the following spaces: 


sup |k(x, y)| <oo > T: L(A) > L®(B), 
xeA,yeB 
| sup |k(x, y)|dy < co => T: L!(A) > L!(B), 
BxeA 
sup | |k(x, y)|dx <co > T: L®(A) > L™(B), 
A 


yeB 


a kis yiiidxdy =< oo > Ts (Ay 21K), 
BJA 


Proof Linearity follows easily from 
| k(x, y)(Af (x) + gx) dx =A | k(x, y) fe) dx + : k(x, y)g(x) de. 
A A A 
(i) Continuity on L'(A) > L®(B): 
ITF llzoBy < sup | Ik(x, y) f (x)| dx < sup ex of | f (x)| dx. 
yeBJA x,y A 
(ii) Continuity on L'(A) > L!(B): 
ITF lize =H | K(x, vy) f(x) de| dy < | sup |k(x, lay f | f (x)| dx. 
B A BxeA A 
(iii) Continuity on L(A) > L®(B): 
If izcocey S sup | Ik(x, y) f(x)| dx < sup | k(x, y)| dx | fll tay 
yeBJA yeBJA 
(iv) Continuity on L(A) > L!(B): 


ITH < ff ke. »ireniar < [ k(x, yl dx dy Hf llc 
BJA AxB 
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Examples 8.6 


1. 
2D 


Integration, f > [ 4 J» is a functional on L\(A). 


The Volterra operator on L'[0, 1] is Vf(y) := ae f. It is an integral operator 


; Il x<y 
with k(x, y) := 


O y<x 


. > The Fourier transform of a function f € L!(R) is defined to be the function 


Ff) =fe):= / en 2mix8 Fn) dy, 


It is an operator F : L'(R) > L™(R). 


. For integral operators S, T, with kernels ks, kr respectively, 


(a) S = T only when ks = kr ae., (since for all f, (S — T) f = [&s@, y)- 
kr(x, y)) f(y) dy = 0); 

(b) S+T has kernel ks + kr, and AT has kernel Akr, 

(c) ST has kernel ksr (x, z) := [ksO, zkr(x, y) dy. 

The kernel acts like a “matrix” with real-valued indices, k,,y in place of A;,;. The 


properties listed here are analogous to those of the addition and multiplication of 
matrices. 


. Which integral operators on L!(R) are translation-invariant, meaning TT, f = 


TaT f, where Ty f (x) = f(x — a)? The requirement is, for all f € L'(R), 


[be yf @ —a) dx = [ke y —a) f(x) de. 
By changing the x-variable in the left-hand integral to x = x — a, we obtain 


k(x +a,y) = k(x,y — a)ae., as f is arbitrary. Equivalently, k(x, y) = 
k(x — y,0) =: k(x — y) a.e.(x, y) for some function k € L!(R). That is, 


Tp=ks fim | ke—ysoray 


called the convolution of k with f. 


. Anexample of a functional that is not integral is given by dx)(f) := f (xo), acting 


on C(X), where xg € X. 


Proof Linearity is immediate, e.g. 6,,(f +9) = (f +9) (0) = f (xo) +90) = 
5x9 (Ff) + 5x9 (g). For continuity, 


ldxo fl = |FQo)1 < uD If)! = Il Fllecx: 
xe 
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Vito Volterra (1860-1940) studied hydrodynamics at Pisa under 
Betti (1883); this led him over the next ten years to consider 
integral equations of the type f(a) — fe k(x, y) f(y) dy = g(z), 
which he showed can be solved by iteration. He applied such 
“functionals” to the theory of optics and distortions, Hamilton- 
Jacobi dynamics, elasticity and electro-magnetism. He moved 
from one professorship in Turin to another in Rome, becoming 
a senator in 1905, and finding the time to write his Volterra 
equations about the numbers of predators and prey in mathe- 
matical biology, until in 1931 he preferred exile to the reign of 
Mussolini. 


Fig. 8.1 Volterra 


7. Differentiation of functions is linear (say on the vector space of differentiable 
functions) but it is not continuous in the oo-norm, e.g. 


|| D cos(2x)||eqay = I-27 sin@x)|lc~ay = 2 


whereas || cos(nx)||cqa) = 1. Similarly, | Dx" Il cpo,11/ IX" Ilcto,14 —> oo as 
n — oo. (Note: here, x” and cos(nx) denote functions.) 


Theorem 8.7 


B(X, Y) is a vector space with a norm defined by 


| Px |ly 
||| := sup = sup ||Txlly. 
x40 Welly — xf=t 


B(X, Y) is complete when Y is complete. In particular, X* is a Banach 
space, with norm 


|px| 
||| = sup -—.. 
«#0 (Ill 


Proof Thenormis well-defined in the sense that if T is an operator, then || 7 x||/||x|| < 
c for all non-zero x € X, and the supremum ||7'|| of such upperbounds c exists. In 
fact, a linear map belongs to B(X, Y) if, and only if, ||7'|| < oo, in which case 


[Px] < TUM. 


This inequality is used extensively in the rest of the text. 
Addition and scalar multiplication of operators is defined by 


(S+T7T)x := Sx+Tx, (AT)x :=ATx. 
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That B(X, Y) with these operations is a vector space is a straightforward calculation, 
using the linearity and continuity of these operations in X and Y (Proposition 7.8). 
For example, 


QAT)(xa+ty)=ATX+y) =ATxX+ATy =(AT)x+(QT)y. 
More crucially, 
|S + T|| = sup ||Sx + Tx|] < sup (Sxl + || Px|) 


lx l=1 x {]=1 


< sup ||Sx||+ sup ||7~| 
Ijx|=1 Ijx=1 


= ISI + IIT 
|AT || = sup ||AT x] = sup [Al |7x|] = JAlIT I 


lxl=1 x |]=1 


Tl] =0 © Vx |ITx|| =0 6 T=0. 


B(X, Y) is complete if Y is : Let T, be a Cauchy sequence of operators in B(X, Y), 
that is, ||7, — Tin|| —~ 0 asn,m — oo. Then, for each x € X, 


Tnx — Tinx|l < ||In — Tin |||] || > 0 


implies that (Tx) is a Cauchy sequence in Y, so that T,.x converges to some vector 
which can be denoted by 7 (x), if Y is complete. We now show that T is linear: 


Thx + y) =Thx + Thy, Tr (Ax) =ATpx, 


| | p. Misa 


T(ix+y) Tx+Ty, T(x) ATx, 


by continuity of addition and scalar multiplication. 
Finally, for any « > 0 and any x € X, 


Zn — T)xl) SW Tn — Tm Mille ll + Tnx — Tx] < € (lal + € lll, 


where m is chosen large enough, depending on x, to make ||Tj,x — Tx|| < €||x|l, 
and n,m > N large enough to make ||T;, — Tin|| < €. Hence ||7, — T|| < 2¢€ for 
n > N. This shows that 7, — T, and so JT, are continuous, and furthermore that 
T, > T. oO 


Proposition 8.8 


If T : X — Y and S: Y — Z are operators, then so is their composition 
ST, with ||ST|| < || S||||T|l- 


B(X) := B(X, X) is closed under multiplication. 
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Proof That ST is linear is obvious: ST(x + y) = S(Tx + Ty) = STx + STy and 
ST (Ax) = S(ATx) = AST x. Also, 


IST xl] = SPX) I< USIP! < STZ, 


and the result follows by taking the supremum for unit vectors x. oO 
Examples 8.9 
1. ||O|| = 0, || || = 1; more generally, ||AZ|| = |A]. 


2. » Every matrix T : RY — R™ is continuous. Let a matrix T have coefficients 
T;;, then similar reasoning as in Proposition 8.5 shows T is continuous with 


(a) the 2-norms: ||T || < ./>0;; |T;;|?, and 


aes [oped mp ITij)). 
ia. 


(b) the co-norms: || 7 || = max; Xe [Tj I. 
(c) the l-norms: ||7 || = max; >’; |Z;jI- 


Note that, just like vectors in R, there are various norms applicable to matrices, 
but that in any of them ||7'|| depends continuously on its coefficients: changing 
them slightly by at most € does not change T drastically, e.g. 


|S — T || < N max |Sij — Tij| < Ne. 
ij 


2 
>; hail < Dj IT, 


Proof of (a). Let x = (a;). By Cauchy’s inequality, 
Dar, la;|? for each i, so 


Wei? = >| tia, 
i J 


2 

Presa 

< SO 1TjI7 hl?. 
ij 


The second inequality, known as Schur’s test and sometimes an improvement 
on the first inequality, states that ||7'|| is at most the geometric mean ./cr of its 
“largest” column and row. Again by Cauchy’s inequality, 


vi, [7a < Le vifaiv len < [al [les 
ri j J j 
Ire? = >| ta; 
i Jj 


“Z Ti \laj|? < 3 
<r) lTijllajl’ <rellxll. 
ij 


3. The norm of the operator y" is || y|| goo when considered as a map el_+ EF. 
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Proof Taking x = (an), y = (bn), 


ly x1 < Do [Pnllan| < (sup |nl) 52 lanl = [yee ll lle. 
n se n 


gives ||y"|| < ve Since the supremum || y|| yoo is a boundary point of the set 
{|bn| in = 0, 1,...}, there is asequence |bn;| > ||yllece, so that ly" || > Ilyllee, 


Wy > ly enjl = Bail > Wylleco (ens l = D.- 


4. » Any linear continuous operator on normed spaces, T : X — Y, is Lipschitz, 
hence uniformly continuous. For example, it maps the ball B,(x) into the ball 
Byr\r(1x) (Exercise 4.17(3)). By Theorem 4.13, it can be extended uniquely to 
an operator on their (Banach) completion spaces, T : X > Y. This extension 
remains linear and continuous, and retains the same norm, || T | = IIT II. 


Proof For any vector x € Xx , there exist vectors xy ex such that x, — x; let 
Tx := limp—+oo TX. Then, for any other vector y € Y, with y, > y, yn € Y, 
T(Ax + y) = lim T(Axn + yn) = lim AT xX_ + Typ = AT(xX) + TC) 
noo n> Oo 


[Px] = im ||Pxnl] < TI} lim |[xnl] = WTI. 
noo noo 


So || T| < ||T||, but, as the domain of T includes that of T, equality holds. 
5. ITI < Sl # Tx|| < Sxl, for example, T = I, S = (6 §).% = (?). 
6. Let @ € X* and y € Y; then the map y@ : x t (¢x)y is continuous and linear, 


with ||yl| = IlyIlIl¢ll- 


Proof 
lyoll = sup |lyex|| = Csup |bx)Ilyll = Il¢lllly ll 


lx =1 lxl=1 


7. Suppose we wish to find the solution of Tx = y (T € B(X, Y)), but it is time- 
consuming or impossible to calculate T~!. If S € B(X, Y) is easily inverted and 
close to T, ie., T = S+ Rand ||RI| < ||S~!||-, then ||S~! Rj] < 1, and the 
iteration 

Xn = Xn + S7'(y — Txn) = S7!(y — Rxn) 


converges to the solution of the equation by the Banach fixed point theorem. 


Exercises 8.10 
1. Show that the following are continuous functionals, 


(a) ox := Oe 174n on 07; 


126 


12. 


13. 


14. 
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(b) ox = Voy clan, and dx := °° 9 sin(nw)dy on ¢! (w is a fixed real 
number); 
(c) 51x := a, on €!, £7, &%. 


. If (en) is a Schauder basis, with x = >°,, ane, for each x, show that the map 


xX +> ay is linear. (That it is also continuous is true in a Banach space, but not 
obviously.) 


. » The right-shift operator is defined by R(a,) := (0, ao, a1, ...). It is an oper- 


ator that satisfies ||Rx|| = ||x|| both as €°© > @° and ¢! > ¢!; it is 1-1 and 
its image is closed. Note that LR = I # RL. Show that it is also continuous as 
Rid! 2. 


. The mapping T : ¢! > ¢!, defined by T (an) := (ao, a1/2, a2/3, ...), is linear 


and continuous. It is 1-1, and its image, denoted Lt :=imT c 2£!, is not closed 
in @!, (Hint: consider (1, 1/2,..., 1/n,0,0,0,...).) 


. The mapping D : et — ¢}, defined by D(a,) := (nay), 1s linear and invertible, 


but not continuous. (Hint: D(e,/n) = en.) 


. Other examples of operators (on £! or £%) are 


S(an) = (41, a0, 43, 42,...), T (an) = (Qn44 — an). 


. Conjugation in C, z +» Z, is continuous but not linear. It is conjugate-linear, 


because Az = AZ & AZ in general. 


. TIA] C [TA] for a continuous linear operator T. 

. Ifalinear map is continuous at one point, say 0, then it is continuous everywhere. 
10. 
11. 


When T : X — Y is 1-1 and linear, then the map x +> ||Tx|| isa norm on X. 


When im T and/or ker T are finite-dimensional, their dimensions are called the 
rank and nullity of T : X — Y. For matrices, 


(a) rank(ST) < min(rank(S), rank(T)), rank(S + T) < rank(S) + rank(T), 
(b) rank(7T)+ nullity(7) = dim X, 

(c) Sylvester’s inequality: nullity(ST) < nullity(S) + nullity(T). 

Typical examples of functionals acting on functions are of the form 
f > k(x) f(x) dx, where k has to satisfy certain properties for the func- 


tional to be continuous. For example, ¢f := Ng e* f(x) dx is a functional on 
L™[0, oof. 


The integral operator Tf(y) := fae ee f(x) dx is continuous as 
L™[1, cof > L™[1, oof, satisfying ||Tfllz < Il fllze- 


Some examples of continuous linear maps on C(R) are: 


(a) Tf(x) = (f@) + f(—x))/2, 
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15. 


16. 


17. 


18. 
19. 
20. 


21. 


22. 


(b) Translations T, f(x) := f(x — a); they are isometries and form a group 
with T, Ty = To+p, 1 = To, T; | = T-a, 

(c) Multipliers M, f(x) := g(x) f(x), where g € C(R). 

What are their kernels and image subspaces? 


Find, where possible, the norms of the above mentioned operators. For example, 
{|5x9 || = 1 on C(X), and the Volterra operator on L°[0, 1] has norm 1. 


It is not so easy to calculate || 7 || in general, even when T is a matrix. Show that, 
Xr 1 
(6 p.) | = maxtlal, lal) and ||(9 4) | = 1 = 


| (2, b) |. If you feel up to it, show that for real 2 x 2 matrices, 


with the Euclidean norms, 


| 9) | By oe aetna CE CRC 
cd ~ 2 


(Hint: Use Lagrange multipliers to find the maximum of (ax +by)?+ (ex +dy)? 
subject to cae y? = 1. See also Exercise 15.20(7).) 


An integral operator T : L'[0, 1] + L®[0, 1], with kernal k € L®[0, 1]?, 
has ||7'|| < ||Kll,00. So if 7, have kernels k, with k, — k in L°[0, 1]}*, then 
T, > T. 


If T,X», — 0 for any choice of unit vectors x,, then 7, — 0. 
If S$, T € B(X) commute, ST = TS, then S preserves ker T andimT. 


An ‘affine’ map f(x) := a+Tx with T € B(X) is acontraction mapping when 
||T || < 1. The iteration x,41 := a+ Tx», starting from any xo, converges to its 
fixed point y = a+ Ty (Theorem 4.16). 


Let Ax = b bea matrix equation, where A is a square matrix. Use Example 8.9(7) 
above to describe iterative algorithms for finding the solution of the equation in 
the following cases: 


(a) (Jacobi) A is almost diagonal in the sense that A = D + R, with D being 
the diagonal of A, and ||R|| < || D7! oo. 
(b) (Gauss-Seidel) A is almost a lower triangular matrix, in the sense that A = 
ies 
| 


L+U where L is lower triangular and ||U|| < ||L7~ ' ‘The inverse of a 


triangular matrix is fairly easy to compute. 


Perturbation Theory. When the solution of an invertible linear equation Sx9 = y 
is known, one can also find the solutions of ‘nearby’ equations (S + €E)x = y, 
where €E is a ‘perturbation’. Writing E = —ST, the new solution satisfies 
(1—€T)x = xo. We might try an expansion of the type x = xo-tex;+e7x2+---; 
se 


show that x,41 = Txn, and the series converges if || E|| < ||S~ ande < 1. 
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We sometimes need to show that two normed spaces are essentially the same, meaning 
that any process involving addition, scalar multiplication, or convergence, in one 
space is mirrored in precise fashion in the other space, and vice-versa. This is the 
idea of an isomorphism. 


Definition 8.11 


An isomorphism between normed vector spaces is a bijective map T : X > Y 
such that both T and T~! are linear and continuous. The spaces are then said 
to be isomorphic to each other, X = Y. 

An isometric isomorphism is an isomorphism that preserves distance, 
|Tx|ly = ||x||x forallx € X,andisometrically isomorphic spaces are denoted 
by X=Y. 

We say that X is embedded in Y, denoted X © Y when X = Z C Y¥ for 
some subspace Z, and the isomorphism X — Z is called an embedding. 


Thus, isomorphic normed spaces are isomorphic as vector spaces and homeomor- 
phic (in fact equivalent) as metric spaces. Intuitively speaking, if X is embedded in 
Y, one can treat it as if it were a subspace of Y even if its elements are not in Y. 

Isomorphisms are also important in practical applications of functional analysis, 
where linear equations of the type Tx = y, with y given, are very common. Three 
requirements are prescribed for such an equation to be well-posed: (i) a solution 
exists, (ii) the solution is unique, and (iii) the solution is stable, i.e., small variations 
in y do not lead to sudden large changes in x, in other words, x depends continuously 
on y. In operator terminology, this means that T is (i) onto, (ii) 1-1, and (iii) T~! is 
continuous. 


Proposition 8.12 


If T : X — Y isabijective linear map, then 7 ~! is linear, and is continuous 
when c||x||x < ||Tx||y for some c > 0. 
When T is an isomorphism, ||7~!|| > ||7||~'. 


Proof Let T be a bijective linear map, let x, y € X, and letu := T~!x, v:= T7'y; 
then Tuu+v) = Tu+Tv = x+y, so thatu+v = T 14 y). Similarly 
T (Au) = ATu = dx gives T~!(ax) = Au = AT7~!x. This shows T~! is linear. 
The inverse is continuous when ||7~!y|| < c|ly|| for all y € Y, in particular for 
y = Tx: ||x|| < c||Tx|| for all x € X. Since T is onto, the two inequalities are 
logically equivalent. 
By the previous proposition, 1 = ||7|| = ||TT~!|] < ||T|||771 I]. Oo 
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Examples 8.13 


1. 


> Suppose a vector space X is normed in two ways, giving two normed spaces 
X\.| and Xj.. The two norms are equivalent if, and only if, the identity map 
I: Xy.) — Xj is an isomorphism (Example 7.3(9)); equivalently, there are 
constants c,d > 0, 

Wx, ellxll < [lll < dllx|. 


For example, R% with the 1-norm is equivalent to R% with the oo-norm. 


€! is not isomorphic to £©. It is not enough to exhibit a sequence, such as 
(1,1,...), which belongs to €° but not to £!, because such a sequence may, 
in principle, correspond to some other sequence in ¢!. One must demonstrate a 
property that ¢! satisfies but 2° doesn’t; e.g. we will show later on that the former, 
but not the latter, is separable. 


. b» The inequality c||x|| < ||7x|| (c > 0), valid for all x in a Banach space X, 


implies that im T is closed and T is | — 1. 


Proof If Tx = Ty, then c||x — y|| < ||Tx — Ty|| = 0 and x = y. Suppose 
Tx, > yin Y; then c||x, — Xm|| < || Tx, -— Txm|| ~ Oasn,m > ~, so (xy) 
is Cauchy and converges to, say, x € X. By continuity of T, Tx, > Tx = y, 
hence y € imT and im T is closed. 


Exercises 8.14 


1. 


(a) The map ) +> (0, a1, a2, 0, 0, .. .) embeds R? in the real space eh, 
(b) The map J : (an)  (an/2”), €° — €', is 1 — 1, linear, and continuous, 
but is not an embedding (|X || goo € c|] Jx || ¢1). 


An infinite-dimensional space may be properly embedded in itself: for example, 
the right-shift operator R : €° > imR C £©% is an embedding. This cannot 
happen in finite dimensions. 


. Separate each sequence x = (d,) into two parts x. := (ao, a2,...) and X_ := 


(aj, a3,...). Then the map x b> (X¢,XQ,) is an isometric isomorphism a 
e! x ge}, 


The space el(Z) consists of ‘sequences’ ...,d—2, d—1, 40, 41, 42, ... such that 
yr x9 [an| < 00. It contains ¢! as a proper subspace, even if ¢! = ¢'(Z). 


Consider a well-posed linear equation Tx = y. An error dy in y gives a corre- 
sponding fluctuation 5x in the solution x, T(x + 6x) = y + dy. Show that 


[exal . lly 
~~ < || TWIT. 
[eal lly 
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The number ||7~!||||T'|| is called the condition number of T. If it is relatively 
large, then the equation is said to be ill-conditioned because the relative error of 
the solution could be larger than that of the data. 


6. * Let T : °° — £€* be an operator with matrix coefficients T;;, i.e., it maps a 
sequence (a;)jen € €~ to (0 Tijaj)ien € €*. Suppose also that the matrix 
is dominated by its diagonal, meaning that for some c > 0, 


Til — >“ ITyl Be. 
J#i 
Then ||7'x|| > c||x||. (Hint: use |a + b| > |a| — |b].) 
7. * If X, and X2 are isomorphic then so are their completions xX 1= X>. 


8. * If X; = Xo and Y; = Y> then B(X,, Y}) = B(X2, Yo). 


Projections 


Our next aim is to show firstly that all N-dimensional spaces are isomorphic to each 
other (for each NV), and secondly to seek an analogue of the first isomorphism theorem 
of vector spaces, namely V/ ker T = imT. Accordingly we need to introduce an 
important type of operator called a projection, and then construct quotient spaces. 


Definition 8.15 
A projection is a continuous linear map P : X — X such that P* = P. 


For example, shadows are the projection of objects in R* ker P 
to shapes in a two-dimensional plane; a flat object on the 
ground is its own shadow. 

Playing around with the definition gives a number of conse- 
quences: 


Examples 8.16 
1. ( — Py? =1-—2P + P* =1 — P isalsoa projection. 


2. I—P)P=0,sox €imP & x—Px =0,andim P = ker(/ — P) isaclosed 
subspace. Similarly im(J — P) = ker — 7+ P) =ker P. 


3. Any x € X can be written asx = Px+ Ud — P)x € imP+kerP. If x € 
im P M ker P = ker(J — P) (ker P, then x = Px + UI — P)x = O, so that 
X =imP @kerP. 
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4. Any linear map on a Banach space, which satisfies P* = P, is automatically 
continuous when im P and ker P are closed subspaces, but more powerful results 
are needed to show this (Proposition 11.5). 


Exercises 8.17 


1. Show that the following are projections: 


0 7\11 
their norms are ./2 and | respectively. 
(b) P:= (( :) and Q := ( a ker P = im Q, so PQ = Ois a projection 
but OP is not. 
(c) RL, where R and L are the shift-operators. 
(d) xd € B(X), where @ € X* and x € X such that dx = 1; in this case, 
X = [x] @ker¢@. 


2. If P and Q are commutative projections, then PQ projects onto im P Nim Q, 
and P + Q — PQ projects onto im P + im Q. 


(a) ( ; 0) and 1 ( : i) they have the same image, but different kernels, and 


3. By induction, if J = P; +---+ P,, with the projections P; satisfying P; P; = 0 
fori A j, then X¥ =im P} @--- Pim Py. 


4. ** Given a closed linear subspace, is there always a projection that maps onto it? 


8.2 Quotient Spaces 


A linear subspace M of a vector space can be translated to form cosets x + M. For 
example, a straight line L C R? passing through the origin, gives the parallel copies 
x +L. Except that with some translations, the resulting line is indistinguishable from 
L; itis easy to see thatx +L =L & x € L. More generally,x+ L=y+Ll<o 
x — y € LL. This latter is an equivalence relation (check!), so the space R? ‘foliates’ 
into a stack of parallel lines, each a coset x + L. It is obvious that when a line L 
is translated by x, and then by y, the result is the line (x + y) + L; in fact, since 
translation in the direction of a € L is irrelevant to the coset, one can even talk about 
the addition of lines, (x + L)+ (y+ L) as meaning x + (y + L). Similarly lines can 
be stretched, A(x + L) = Ax + L (unless A = 0), and the distance between lines is 
defined in elementary geometry as the minimum distance between them. This space 
of parallel lines is a good candidate for a normed space. 

Turning to the general case, a vector space partitions into the cosets of M to form 
a vector space X/M, which is normed when M is closed, and complete when X is 
complete: 
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Proposition 8.18 


If X is a normed space and / is a closed linear subspace, then the space 
of cosets 
X/M:={x+M:xeXx} 


is a normed space with addition, scalar multiplication, and norm defined 
by 


(x+M)+(y+M):=(«+y)+M, 
A(x + M) :=Ax+ M, 
|x + M|| = d(x, M) := inf ||x — v|]. 
veM 


If M is complete, then X/M is complete = X is complete. 


Proof That the relation x—y € M is an equivalence relation with equivalence classes 
x -+ M, and that the defined addition and scalar multiplication of these classes satisfy 
the axioms of a vector space should be clear; the zero coset is M and the negative of 
x + M is —x + M. Let us show that we do indeed get a norm: 


(x + M) + (y + M)|I Ix + y+ Ml| = int |x + y— wll 
= inf —u-—v 

inf, [e+ y—u— ol 

< inf (|x — ull + lly — vl) 

u,vEM 


= inf |x —u|| + inf —v 
inf jx — ull + inf ly — vl 


= |x + M|| + lly+ MI 
A(x + M)|| = ||Ax + M|| = inf ||Ax — v|| 
veM 


= inf ||Ax—Au|| (ford £0) 
uEeM 


inf |A|||x — ull 
uEeM 


= |Alllx + MI] 
|x + M|| = inf ||x — v|| > 0. 
veM 


|x +M||_ =0 & da,M)=0 8 xEeM=M &x4+M=0 
+ M (Exercise 2.20(9)). 
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Completeness. Let x, + M be an absolutely convergent series in X/M, ice., 
nn llxn + M|| converges. Now, for each n, there is a v, € M such that 


IlXn — Unll < [len + MI] + 1/2". 


The left-hand side can be summed by comparison with the right, so 7, (4% — Un) 
converges to some x, since X is complete (Proposition 7.21). Thus 


[Sot m- +m) =| Someta < |Dou—m x] +0 


n=1 n=1 


since in general ||a + M|| < ||a + v|| forany v € M. Hence >°,, (xn + M) converges, 
along with every other absolutely summable series, and X/M is complete. 
Conversely, let (x;,) be a Cauchy sequence in X; then 


In + M) — Gm + M)|L = [len — Xm + Ml < |lXn — Xml 


implies that (x, + M) is Cauchy in X/M, so converges to, say, x + M. This means 
there are v, € M such that x, — (x + v,) — 0; but then, 


lun — Um || < |lXn — Xm — Va + Vml| + |lXn — Xml > O 


shows (v,,) is Cauchy in M and converges to, say, v € M. Thus x, > x + v. oO 


If M is a linear subspace of X such that X/M is finite dimensional, then its 
codimension is defined by codim M: = dim(X/M). 


Examples 8.19 


1. The cosets of the closed subspace M := (1 C R? are the lines parallel to M, 
and R*/M = R. xs 


Proof A vector x belongs to x9 + M when x = (pe) + racy ) for some t € R, which 
is the equation of a line parallel to (;)- The map a (6) +M,R— R*/M is 
Ge + M and 


linear and continuous. It is bijective since G ) +M= 


at a2 a, — a2 1 _ 
(3)-(G)em om ( 0 )=2({) ease. 
+M 


The inverse map is continuous as the distance || (6) + \| equals |a|/./2. 


2. If X is finite-dimensional, then so is X/M, with 


codim M = dim X/M = dim X — dim M. 
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Proof Let e1,..., @m be a basis for M, extended by en+1,..., en to a basis for 
X. Then, for any vector x = pa iié;, its coset, 


n n 
x+M=) de +M= > di (e; + M), 
i=l i=m+1 


is generated by ey 41 + M,..., en + M. Moreover, these are linearly independent, 
since 


n n m 
>» Nei +M=04+M SD DS diei = > aje; € M 
i=m+1 i=m+1 i=l 

S& i, =0Oi=m+1,...,n 


Hence dim X/M =n—-—™m. 


3. If @ € X* then ||x + ker d]| = - 


Proof For @x #0, then X = [x] © ker ¢, and 


2 fhe IAl}@x| |px| 

IP = uP yl sup => 

ollyll =~ aekerg Ax + al) inf ||x + al 
acker d 


The following proposition states, in effect, that when one translates a closed linear 
subspace to any distance c < 1 from the origin, the resulting coset intersects the unit 
sphere: 


Proposition 8.20 Riesz’s lemma 


For any non-trivial closed linear subspace VM, and 0 < c < 1, thereisa 
unit vector x such that ||x + M|| = c. 


Proof Let y ¢ M so that ||y + M|| > 0; by re-scaling y if necessary, one can assume 
lly + M|| =c.Themap f: M — R, defined by f(a) := ||y + a||, takes values close 
to c, as well as arbitrarily large values (||y + Aa|| > |A||la|| — |ly|| > co asra —> ov, 
for M # 0). Since M is connected, and f is continuous, its image must include |c, oo[ 
by the intermediate value theorem (Proposition 5.6). In particular there is ana € M 
such that || y + a|| = 1, so letting x := y+a gives ||x + M]| = |ly+ M]| =c. oO 


Exercises 8.21 


1. » The mapping x + x + M, X — X/M, is linear and continuous. 


2. LetM :={ f € C[0, 1]: f(O) =O}, then2+ M={ f € ClO, 1]: f(O) = 2}, 
and C[0, 1]/M = C. 


8.2 Quotient Spaces 135, 


3. (a) X/X =0, X/0=X. 
Xx Y 
(b) If X, Y are normed spaces, then ——— = 
xX x0 


4. Let X be a finite-dimensional space generated by a set of unit vectors E := 
{ej :i = 1,...,n}, and let M; := [[E \ {e; }]]. Then the coefficient |q;| in 
x = >-"_, ae; is at most ||x||/||e; + Mill. Thus, in finding a basis for X, it is 
best to select unit vectors that are as ‘far’ from each other as possible. 


5. Let M be aclosed subspace of X. If both M and X/M are separable, then so is X. 


8.3 R* and Totally Bounded Sets 


That finite-dimensional normed spaces ought to be better behaved than infinite- 
dimensional ones is to be expected. What is slightly surprising is the following result 
that they allow only a unique way of defining convergence: Any norm on C¥ is 
equivalent to the complete Euclidean norm. This is an example of a mathematical 
“small is beautiful” principle, in the same league of results as “finite integral domains 
are fields”. 


Theorem 8.22 


Every N-dimensional normed space over C is isomorphic to C’, and so is 
complete. 


The theorem is also true for real finite-dimensional normed spaces: they are iso- 
morphic to RY. 


Proof Let X be an N-dimensional normed space, with a basis of unit vectors 
e1,...,en, and let C% be given the complete 1-norm (Example 7.16(3)). There 
is a map between them, J : C’ — X, defined by 


a] 
YS : > aye, +---+anen. 
an 
Linearity of J follows from the distributive laws of vectors; that it is 1-1 and onto 


follow from the linear independence and spanning of { e,, } respectively. 
J is continuous since 


| Jxllx = llover +--+ +anenllx 
< lay] +--+ + lan| 
= |[xll 
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To show J~! is continuous, let f(x) := ||Jx||, which is a composition of two 
continuous functions: the norm and J. The unit sphere S := {u € Cr lull; = 1} 
is a compact set (since it is closed and bounded in CN = R2N (Corollary 6.20)), so 
JS is also compact (thus closed in R). One point that is outside f'S is 0, 


f(x) =0< |Jx| =0 Ss Jx=0 8 x=0. 


Zero is therefore an exterior point contained in an open interval ]—c, c[ outside f'S. 
This means that c < || /u|| for any unit vector u. Applying this tou = x/||x||, for any 
(non-zero) vector x € C%, we find c|x||; < || Jx|| as required (Proposition 8.12). 
Clearly, the proof does not depend critically on the use of complex rather than 
real scalars. oO 


Proposition 8.23 Riesz’s theorem 


A subset K of a normed space X is totally bounded <= K is bounded and 
lies arbitrarily close to finite-dimensional subspaces, meaning 


Ve > 0, SY N-dim subspace of X, Vx € K, ||x+Y|| <e. 


Balls are totally bounded only in finite-dimensional normed spaces. 


Proof (i) Let K C gee B.(x;) be a totally bounded set in the normed space X, 
and let Y := [x1,..., xv ]]|. Any point x € K is covered by some ball B, (xj), ie., 
|x — xi|| < €, so that |x + Y|| = infyey ||x — yl] < €. Since € can be chosen 
arbitrarily small, this proves one implication in the first statement. 

In a finite-dimensional normed space, bounded sets are totally bounded: This is 
true for C% because balls (and their subsets) are totally bounded (Exercise 6.9(2)). 
Any finite-dimensional space Y has an isomorphism J : CY -> Y by the previous 
theorem. If A is a bounded subset of Y, J~!A is a bounded set in C% (Exercise 
4.17(3)), hence totally bounded; mapping back to Y, A = J J~'A is totally bounded 
(Proposition 6.7). 

For the converse of the proposition, suppose K is bounded by r, and lies within € 
of an N-dimensional subspace Y. This means that if x € K then ||x|| <r, and there 
is a y € Y such that ||x — y|| < €,so 


IIyll < [lx] + lly —xll <r +e. 
But we have just seen that the ball B,+.<(0) M Y is totally bounded in Y, and can be 


covered by a finite number of €-balls, Be (y;), i = 1,...,n. In particular, there is 
some y; for which || y — y;|| < €, and so 
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lx — yill < lx — yll + lly — yill < 2e, a 
=> KC Ure Boe (yi). Vie 


(ii) Suppose X has a totally bounded ball, which by re-scaling and translation can 
be taken to be the unit ball By (Proposition 7.5). It must be within € < 5 of a 
finite-dimensional closed subspace Y. In fact X = Y, otherwise we can use Riesz’s 
lemma to find a vector y € By withd(y, Y) = ||y+ Y|| = 5 > €. oO 


Examples 8.24 
1. All norms on C% are equivalent. 


2. Given a point x € X and a finite-dimensional subspace M, there is always 
a best approximation a € M to x. We need only look in the compact ball 
B := Byx\[0] 0 M, and since the function a +> |la — x|| on it is continuous, 
it achieves the minimum (Corollary 6.16). 
For example, there is always a polynomial of degree at most n that best approxi- 
mates a function with respect to any given norm. 


3. If M is a complete subspace of a normed space, and N a finite-dimensional 
subspace, then M + N is complete (see Example 7.11(2)). 


Proof It is enough to show that M + [e]] is complete when e ¢ M;; the result then 
follows by induction. For any x € M,a € C, 


la|lle + M|| = |lwe + M|| < llae + x], 
Ill] < lx + aell + lalllell < cllwe + xl]. 


So if (%, + ae) is a Cauchy sequence in M + [[e]], then so are (a,,) and (x,,), in 
C and M respectively. Hence, x, + @,e > x +aee M+ [el]. 


Exercises 8.25 


1. Totally bounded sets cannot be open (or have a proper interior) in an infinite 
dimensional normed space. 


2. The set of polynomials of degree at most n forms a closed linear subspace of 


L![a, b] with dimension n + 1; a basis for this space is 1,x,..., x”. 


3. As anillustration of Riesz’s theorem, the unit ball in the infinite-dimensional space 
£* (or £ 1 is not totally bounded. (Hint: Show (e,) has no Cauchy subsequence.) 


4. In finite-dimensional normed spaces only, the compact sets are the closed and 
bounded ones. 


5. Totally bounded sets need not lie in a finite-dimensional subspace, just arbitrarily 
close to them. Can you think of an infinite-dimensional totally bounded set? 
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6. * A space generated by an infinite but countable number of linearly independent 
vectors, [[e1, 2, .. . ]], cannot be complete: the linear subspaces [[e1 ]], [[e1, e2]], ... 
are closed in it, and do not have an interior (why?), so by Baire’s theorem their 
union cannot be a complete metric space. 


Remarks 8.26 


1. Continuous operators are widely referred to as being “bounded”; except for the 
zero operator, their image is certainly not bounded! The reason they are called so 
is that, being Lipschitz maps, they send bounded sets to bounded sets. The usage 
of “bounded” is avoided in this text, in favor of the equivalent term “continuous”. 


2. By analogy with matrices, it is customary to write Tx instead of T(x). This is a 
slight abuse of notation; a linear map on the vector space of matrices need not act 
on the left,e.g. Ar> AB, Are AB+ BA, At A'l,and Ati BAB are all 
linear. 


3. For the initiated, the idea of continuous linear maps can be extended to continuous 
multi-linear maps (tensors); they also form a Banach space with norm 


[7 ]| = sup |T@1,...,b1,--)1/lleall--- ill... 


4. B(X, Y) forms part of the larger space of Lipschitz functions X — Y. For such 
functions, || f || -= sup, 2. ¢x Il f@1) — f(%2)II/lla1 — x2|| satisfies the norm 
axioms, except that || f|| = 0 < /f is constant. 


Chapter 9 
Main Examples 


Having fleshed out a substantial amount of abstract theory, we turn to the concrete 
examples of normed spaces and identify which are complete and separable. Unavoid- 
ably, the proofs become more technical once we leave the familiarity of finite dimen- 
sions and enter the realm of infinite-dimensional spaces, having to deal as it were 
with sequences of sequences or functions and different types of norms. However, a 
careful study of this section will be rewarded by having an armory of spaces, so to 
speak, ready to serve as examples to confirm or refute conjectured statements. 


9.1 Sequence Spaces 


The Space © 


A sequence in 2° is a sequence of sequences, x, = (dy; ). Convergence in (°° means 
uniform convergence of the components, that is, 


Xn > 0 sup; |ani| > Oasn > oc 
<> |qni| > 0 asn — ov, uniformly for all components i, 
& Ve >0, AN, Vn > N,Vi, |dni| < €. 


For example, of the following three sequences of sequences, only the first converges 
to 0, even though each component converges to 0. 


(ce yee eee, CG. 5 (cee oem be eee 
G.555--)  ©,1,0,0,...)  ©,0,1,1,1,...) 
Ca) (0,0, 1,0) «2 (0;0, 0,0; 15.5) 
s) y 4 
(0, 0, 0,0, ...) (G,0;,.0,03.25) (0,0, 0, 0)..03) 
J. Muscat, Functional Analysis, DOI: 10.1007/978-3-3 19-06728-5_9, 139 
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Theorem 9.1 


£° is complete but not separable. 


Proof (i) Let (x,) be a Cauchy sequence in €%, ie., ||X,—Xmllgoo — O as 
n,m —> oo, Note that ||xp|| poo < c since Cauchy sequences are bounded (Exam- 
ple 4.3(5)). 


X1/a11 42 413... < ||X1 Ilex 
X2/G21 422 423... < ||Xallex 
Li) vv 

X|d, a2 a4... <ec 


(The absolute signs of ay; are omitted in the horizontal rows.) 

For each column i, |ani — Gmil < ||[¥n —Xm|leoo — 0, So (ani) is a Cauchy 
sequence in C, which converges to, say, dj := liMy—+o0 Ani- 

That x := (a;) is in © follows from taking the limit n + oo of 


ldnil < ||Xnlleo <c. 


More crucially, x, — x in €© since, for each column i and any n € N, one can 
choose an m > n large enough that |a,,; — a;| < 1/n, so that 


1 
lai — Anil < |ai — Ami| + lami — anil < < + |lXm Xn || poo > 0, 


as n —> oo, independently of 7. 


(ii) To show £° is not separable we display an uncountable number of disjoint balls 
(Exercise 4.21(4)). Consider the sequences that consist of 1s and Os. The distance 
between any two of them is exactly 1, so that the balls centered on them with radius 
1/2 are disjoint. Moreover, these sequences are uncountable for the same reason that 
the real numbers are uncountable: If one were able to list them as 


X1 = (ai1, 12, 443,.-.) 
X2 = (a21, d22, a23,...) 
X3 = (431, 432, 433,...) 


one could take the diagonal sequence (a1, a22, ...), and swap its Is and Os, giving 
a sequence (1 — ay ») that cannot be in the list, for | — dan 4 ann. oO 
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Proposition 9.2 


The space of convergent complex sequences, and of those sequences that 
converge to 0, 


c:={(an):4daeéeC, lim a, =a}, 
n—->co 


co := {(a): jim, an = 0}, 


are complete separable subspaces of £~. 


Proof The spaces are nested in each other as co C c C €™ since convergent 
sequences are bounded. They are easily shown to be linear subspaces: ay, + by, > 
a+b, ha, > Xa when ay, > aandb, > basn > ~. 

co is closed in °: Let x, > x in 2°, with x, € co; their components converge 
uniformly a,j; > a; asn > ©. 


X1/d11 412 aj3.... 2 0 
X2/d21 d22 423... > O 


We 4 3 


2 
xl|a, a2 a...2>0 


Now, for any € > 0, there is an x, in co such that ||x, — x||goo < €, and for this 
sequence, there is an integer NV, such that 


i=>N => |anil < €. 


It follows that fori > N, 
lai| < |dnil + lai — anil < |ani| + ||x — Xn || eco < 2e 


so lim;-so 4; = O and x € co. 
co is separable: The vectors @, := (dnj) = (0,..., 0,1, 0,...), with the 1 occur- 
ring at the nth position, form a Schauder basis for co: for any x = (dy) € Co, 


N 
|x — > ane 400 = sup |an| > 0, asN > ~w. 
n=0 n>N 


If >, Qn€n = do, Onn, then (ag — bo, ay — bj, ...) = O hence a, = b, and the 
coefficients are unique. 
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The spaces c and cg are isomorphic: Let J :c > co C €~ be defined by 
J (do, 41, 42,...) = (—a, dag — a, a, —a,...), where a := fim. Ay. 
J is 1-1 since 
J(an) = J(bn) => a=b and Vn, an —a=by—b => (an) = (bn). 


J is onto co for, given any y = (b,) € co, itis clear that x := (b, — bo, b2 — bo, .. .) 
maps to it. In fact, writing 1:= (1, 1,...), 


Jx = Rx —al, J~ly = Ly — bol, 


where R and L are the shift operators. This observation shows that both J and J~! 
are continuous and linear since (a,) +> a, aS well as (b,) +> bo, are functionals 


la] =| lim ap| = lim |ay| < sup |an| = |](4n)Il ec 
noo noo i 


Ibol < sup |bn| = ||Gn) Iles. 
n 


It follows that c has the same properties of completeness and separability that co 
enjoys. Oo 


Theorem 9.3 


Every functional on co is of the type (an) +> >~,, bndn Where (by) € £ | and 


Proof Given y = (bn) € é! and x = (an) € co, the inequality 


loo) 
ly-x|= Puan 
n=0 


oo [o,e) 


< So lbnllan| < sup lan} >" [bal = [Ixllecollyller 
n 


n=0 n=0 


shows that the linear map y' : xX FH y-x := = bndn (Example 8.4(4)) is 
well-defined and continuous on £% (including co), with ||y"|| < llyllz- 

Every functional on co is of this type: By the linearity and continuity of any 
gb € co*, 


00 0° 
ox = 0( > anen) = Dann =y-X, where by, := @€n, Y := (bn). 


n=0 n=0 
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Also, writing by, = |bn ein in polar form, 


CO [o,@) CO 
i lPnl = doe ben = (deen) < PINE Dlheco = Ill. 
n=0 n=0 n=0 
hence y € £!, with yl < Well = lly'|]. Combined with the above, we get 


Iylle = ly" 

Isometric isomorphism: Let J : €' — cj} be the map y +> y'. The above 
conclusions can be summarized as stating that J is an onto isometry. That J is linear 
is easily seen from the following statement that holds for every x € co, u,v, y € el 


1 


(u+ v)-X = ee (Un + Un)dn = Dg Undn + D9 Undn = UX +V-X, 


Ay) x= Yo Aba)aa = is ee bnadn = (yx), 
so(u+v)' =u' +0! and(ay)'’ =Ay!'. Oo 


Exercises 9.4 


1. The kernel of the functional Lim : (a,) +> lim ay, onc, is co. 
n—->oo 


2. Any convergent complex sequence a, — a can be written as 


(an) = SG —a)je, + al, 


n 


where 1 := (1,1,...). Deduce that the vectors e, together with 1 form a 
Schauder basis for c; what is its dual space c*? 


3. » One can multiply bounded sequences together as (dy)(bn) := (Gnbn), to 
get another bounded sequence, ||x y|| poo < ||X|| goo || yl] goo. This multiplication is 
commutative and associative, and has unity 1. Only those sequences which are 
bounded away from 0 (i.e., |a,| 2 c > 0) have an inverse, namely (an)! = 


(a): 

4. * The inequality ||x y||p1 < ||X||¢0|| yl] 1 is also true, so the map x +> M,, where 
M,y := xy, embeds £~ in B(¢!). 

5. The vector space [[eo, e1, ... ]] is often denoted by coo; it is the space of sequences 
with a finite number of non-zero terms. Its closure in the €~-norm is coo = co. 


6. co contains the space of sequences £°° := { (an) : dc, Vn > 1, |an| < c/n*} 
(s > 0). What is its closure? Can you think of a sequence which is in cg but not 
in any £9°? 


7. The distance between a sequence (a,) € €™ and co is lim sup,, |ay|. 


8. * C[O, 1] can be embedded in 2%, since f € C[0, 1] is determined by its values 
on the dense subset QM [0, 1] which can be listed as a sequence (q,,). Check that 
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the mapping f +> (f(gn)) is linear and isometric. The Banach-Mazur theorem 
states that every separable Banach space is embedded in C[0, 1]. 


The Space £! 


Convergence in ¢! is more stringent than in ©. This can be seen by the inequality 


1 
Vx = (ai) € 1, [lalleo = sup lai] = max Jail < >? ail = [elle 
i l 


sO X, — 0 in €™ does not guarantee x, — 0 in !. For the latter to occur, not 
only must the components approximate 0 together, but their sum must also diminish. 
Fewer sequences manage to do this, and this is reflected in the fact that ¢! is separable. 


Theorem 9.5 
¢! is complete and separable. 


Proof (i) Since ¢! = cp» One can argue that ¢! is complete, as are all dual spaces 
(Theorem 8.7). 

Alternatively, the following direct proof shows that every absolutely summable 
series in £! converges (Proposition 7.21) (Note: as ¢! is defined in terms of sums, 
it is more straight-forward to use series instead of Cauchy sequences). Suppose 
X1;+X2+--- isa series such that >”, |xn||¢1 = s. In the following diagram, we 
will show convergence of the various vertical sums. 


x1 |aio tan +ai2 +---} Ilxille 
+/+ +4 4 + 
x2 |a20 + a2 + a22 + ---} I[xalle 
Me lhai: “sets wah + 
L | 1 1 1 
x ag a| a2 ae Ss 


(Note that the absolute signs of ay; are omitted in the horizontal sums.) 
The main point of the proof is that any rectangular sum of terms in this array is 
less than the corresponding sum on the right-hand column: 


JON N 
<I DY lanl < DO alle 


i=I n=M i=I n=M n=M 
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In particular, taking the ith column, | x3 ani| < ~~, |dni| < s shows that it converges 
in C to, say, aj := >(720 ani. In fact, the whole array sum is bounded, >”; |a;| = 
Di [dino ani] < 5, so that x := (a;) belongs to £'. 

Finally, note that any rectangular sum goes to 0 as it moves downward, because 
ye y WXnlle: > Oas N > oo. Hence 


N 


N le) lee) le) 
|x - Doxa = dolar — Do ani] = D5] Dd) ani| > 0 
n=1 i=0 


n=1 i=0 n=N+1 


giving ¥ = > gta 
(ii) The sequences e, := (0,...,0, 1,0, ...), with the | occurring at the nth position, 
is a Schauder basis because for any vector x = (dy) € e!, 


N 
lx — Di anenll = Io, a1...) — Go, --+, a, 0,0, ..DIher 
n=0 el 
= |10,...,0,an4i,-- Ile 
[o,@) 
= y> lan] > 0 asN > cw 
n=N+1 


since >”, |dn| converges. If x = D°°° 9 bye, as well, then by, = @m -X = Am for 
each m € N, so e, form a Schauder basis. oO 


Proposition 9.6 


Every functional on @! is of the type (a,) > >, Onan Where (b,) € 2%, 
and 


Proof The proof is practically identical to the one for cj = é!, except that now 
y = (bn) € €© and x = (an) € €!. The inequality 


Iy-*1 <0 [ballan| < sup [bal >> lanl = Wylex le 
n i n 


shows that the linear mapping y" : @! — C is well-defined and continuous with 


ly" < My lleco. 
Every functional on ¢' is of this type: Let @ € ¢'*, then by linearity and continuity 


of ¢, 
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[o,e) [o,e) 


ox = 6( > anen) = DV anbn =y-x, whereby, := den, y := (bn). 


n=0 n=0 
Moreover |bn| = nl < ll{lenlle: = llPl| so that y € £, with || ylleo < IPI. 
Aso = y', lylleo = lly" ll. 


Isomorphism: The mapping J : 2° — ¢!*, y + y', is linear and the above 
assertions state that J is an onto isometry. oO 


Exercises 9.7 


1. Suppose each coefficient of x, = (Gni) € ge} converges, Anj > aj avn > oO, 
and let x := (aj) € £!: then it does not follow that X,—2> xin e}, e.g. en *& 0. 
But if |ay; — a;| is decreasing with n (for each i), then x, — x in gl, 


2. » ¢! has a natural product, called convolution: 
n 
(an) * (bn) := (agbo, a1bo + aob1, abo + ayb) + aob2,..., > nibs ia) 
= 


This is indeed in ¢! because the sum to n terms (a triangle of terms a;b;) is less 
than (|ao| + +--+ |an|)(bo| +--- + |bn|) (a square of terms), so that 


lx * yller < Ile llerllyller- 


Convolution is commutative and associative, and eg acts as the identity element 
eo *x =x. The inverse of (1, a, 0,...) is (1, —a, a”, —a3,...), which is in €! 
only when |a| < 1. 


3. Ifx € €!, but y € &~, then x * y is a bounded sequence 


Ix * Ylleco < [lx lle ll Yllece- 


4. The right-shift operator can be written as a convolution Rx = e; *x. In general, 
R°X = en *X, SiNCe en * Cm = Cntm. The “running average” of a “time-series” 
8 SC wig hy Oye VM 

— —4 


N 


5. * A subset K of £! is totally bounded © it is bounded and 
Ve > 0,4N €N,Any,...,ny, Vx € K, WX IN\(n1,..0nn }llet <€. 


(Recall that K lies arbitrarily close to finite-dimensional subspaces.) 


6. €! has the functional Sum(b,) := pe by. It corresponds to the bounded 
sequence 1 = (1, 1,...),ie., Sum x = 1- x. Hence if baer |dni| < co then 
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Lew =D Daw 


i i 


7. The functionals 5y (ay) := ay correspond to ey € €%,ie., dyx = en - Xx. 
Similarly, the sum Sumy (ap) := ~~ s ay, corresponds to e9 +--- + env = 
(,...,1,0,...). Since (1,...,1,0,...) A Lin 2%, we also have Sumy 4 
Sum in ¢!*, yet Sumy (x) — Sum(x) for any sequence x € !. We'll discuss 
this apparent paradox in a later section (Section 11.5). 


The Space ¢? 
This normed space has properties that are, in many respects, midway between ¢! and 
£°. Yet it stands out, as it has a dot product x - y defined for any two of its sequences, 


and ¥ - x = ||x||?; we will have much more to say about normed spaces with such 
dot products in the next chapter. 


Theorem 9.8 
¢? is complete and separable. 
Proof (i) Let x, = (dni) be a Cauchy sequence in ¢7; the terms are uniformly 


bounded ||x,|| < c. For each i, 


2 2 2 
lani — Gmi\” < by lani — Amil” = \|Xn — Xml" > Oasn,m > ow, 


L 


SO (dayj) is acomplex Cauchy sequence which converges to, say, aj ‘= liMp—+o0 Ani- 
The sequence x := (a;) belongs to ¢* by taking the limit N > 00 of 


N N 
2 : 2 : 2 2 
lai? = lim S" ani|? < lim |lxnl|? < c?. 
n—- oo n> oo 
i=0 i=0 
AS Xy, is Cauchy, for each € > 0 there is a positive integer M such that 


nym >M => ||xXn-—Xmll <€. 


Moreover, for eachi € N, there exists an integer M; such that 
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Therefore, for any N € N, picking m larger than M, Mo, M,..., My, gives 


N N N 
SS lani —aj|? < > len — ami|? + > lami —aj|? 
i=0 i=0 1=0 


which implies ||x,, — x|| < 3¢ forn > M. 
(ii) For separability, ¢? has the Schauder basis e,, since for any X = (dy) € 2, 


N lee) 
Ix — So anenll =10,..-,0,awsi,-- he = | >) lanl? > 0. 
n=0 02 n=N-+1 
Uniqueness of the coefficients follows as in the proof of Theorem 9.5. Oo 


Proposition 9.9 


Every functional on ¢? is of the type (a,) > >), Pndn where (b,) € 0°, 
and 


e* = Zz 


‘Proof’. The argument is so similar to the previous ones about cj and ¢'* that it is 
left as an exercise (use Cauchy’s inequality at one point). 


Exercises 9.10 
1. Show that |x - y| = ||x||||_y|| if, and only if, y is a multiple of x (or x = 0). 
2. The map (aj,...,ay) +> (a},...,ay,0,0,...) embeds C% in ¢?. 


3. € contains the interesting compact convex set { (dn) : |an| < 1/n}, called the 
Hilbert cube. It is totally bounded in ¢?, as it is close within any € to a finite- 
dimensional space { (a,) : Vn > Ne, dn = 0}, yet it is infinite-dimensional; it 
cannot enclose any ball (else the ball would be totally bounded). 


4. » The various sequence spaces are subsets of each other as follows: 
co CE Ce Ca ccece™, because |IxIleoo < [xl < Ixlle, 


but £! c 7 C cp are not Banach space embeddings! Show further that cog with 
the respective norms is dense in ¢!, €*, and co (coo cannot be complete in any 
norm, Exercise 8.25(6)). 
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The Space £? 


The space €? := { (a,) : ad, € C, >, |an|? < 00}, p > 1, is endowed with addition 
and scalar multiplication like the other sequence spaces, and the norm 


[ee 


1/p 
Weller = (Solan?) 


n=0 


Our aim in this section is to prove the triangle inequality for this norm, otherwise 
known as Minkowski’s inequality, and show ¢? is complete and separable. 

As the reader is probably becoming aware, it is inequalities that are at the heart 
of most proofs about continuity, including isomorphisms. They can be thought of as 
a ‘process’ transforming numbers from one form to another, perhaps more useful, 
form, but losing some information on the way. Much like tools to be chosen with 
care, some are “sharper” than others. (See [8] for much more.) The following three 
inequalities are continually used in analysis. The first is a gem, simple yet rich: 


a“b’ <aat+ Bb, fora, p,a,b>0,a+fB=1. (9.1) 


This inequality states that any weighted geometric mean is less than or equal to 
the same-weighted arithmetic mean. The special case /ab < (a+ b)/2 has already 
been encountered previously. Writing a = e*, b = e” gives 


et tBY < ae* + Be’. 
This is equivalent to the convexity of the 
exponential function, and can be taken as its 


proof (any real function with a positive second 
derivative is convex). 


x ax+By y 
The same idea applied to the convexity of x”, p > 1, gives 
(aa+ Bb)? <aa’?+ Bb?, fora, B,a,b>0,a+f=1. (9.2) 
A third inequality of importance is 
a? +b? < (a+b), forp>1,a,b>0. (9.3) 
Its normalized form 1 +t? < (1+ 1+)’, fort = b/a > 0, can be obtained by 


comparing their derivatives p t?—! < p(1+t)?~!, as they start from the same value 
att = 0. 
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Proposition 9.11 


Fora,b,a,B >00+B=1,p21,q>0, 


min(a,b) < (aa~4 + pba) 4 harmonic mean 


= ao bP geometric mean 
< aa+ Bb arithmetic mean 
< YaaP + BbP, root-mean-“square”’ 
< max(a, D). 


Proof (i) If a < b (without loss of generality), then a? < b7, so 


a 4 f gore 1 
ad bq ad aq 


which is equivalent to the first inequality of the proposition. 


(ii) The second inequality is equivalent to a~°4b-?4 < wa~4 + Bb~4, which is (9.1) 
with a, b replaced by a~4, b~4 respectively. 


(iii) Similarly, the third inequality is essentially a®/?b*/? < wa!/P + Bb'/?, which 

is (9.1) with a, b replaced by a!/?, b!/? respectively. 

(iv) If a, b in (9.2) are substituted by a!/? and b!/? one obtains (wa!/? + Bb!/P)P < 

aa+t pb. 

(v) The fifth inequality is precisely (9.2), while the sixth one follows easily if we 

assume, say, a < b; for then, a? < b?, soaa? + Bb? < (a+ B)b? = Dd?. 
Substituting g/p for p in (9.3), when p < q, and a? for a, b? for b, yields 


(a4 + b1)'/1 < (a? + bP)? for0 < p <q, 
and furthermore, substituting w!/%a for a and B'/%b for b in this inequality, gives 
(aat + Bb1)'/4 < (aa? + Bb?)!/P 
which is implicitly implied in the scheme of inequalities above. Oo 


An induction proof generalizes all these inequalities to arbitrary sums or products, 


eal 


Qy) ++ an” Kaya; +--+ Onan < dona? +-+-+onah, (9.4) 


when aj, a; > 0,0, +--- +a, = 1, p > 1, as well as 


dal+---tal < ga? +-.-+a2, forp <q. 
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Hermann Minkowski (1864-1907) studied under Lindemann (of 
m-transcendentality fame) at the University of Kénigsberg, to- 
gether with Hilbert. At 19 years of age, two years before he 
graduated with a thesis on quadratic forms, he had already 
won the prestigious French Academy’s Grand Prix. Starting 
1889, he developed his “geometry of numbers” ideas on lattices, 
including his inequality. After teaching in Zurich (where Ein- 
stein was a student), he moved to Géttingen, became interested 
in physics and presented his version of special relativity as a 
unified space-time. 


Fig. 9.1| Minkowski 
This last inequality remains valid for infinite sums, ||x||q < ||x||~p when p < q, 


implying €? C £4, It shows that a bounded sequence lies in a whole range of ¢? 
spaces, down to some infimum p. 


Proposition 9.12 Minkowski’s inequality 


lx + ylleo < |lXller + llyller, where 1 < p < oo. 


Proof Allnorms in this proof are taken to be the £?-norm. Let u = (a,) and v = (by) 
be two sequences in €?. Summing the inequality (w|a| + B|b|)? < ala|? + BIb|? 
(a + B = 1, a, B > 0) for a sequence of terms gives 


Di lean + Boal? <>) lanl + Bln? <a>) lanl? +B >. lbnl?. 
n n n n 
or — jaw + Boll? < alul|” + Biull”. 


Substituting wu = x/|[xll],v = y/Ilyll,.o = Ixll/(lell+IlylD, 8 = lyll/Cei+ ly), 
gives 


IIx + yl 


——__=__ = jlau + Bo|| < (a+ p)'/? = 1. 
xi + llyll 


Proposition 9.13 Hélder’s inequality 


1 1 
Ix- ys l*llee ll yee, where =e a epee 


Proof Substitute a'/% and b!/? instead of a and b ina“b® < wa+ fb, witha = 1/p, 
B =1/p’, to get 
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(9.5) 


Summing this for a sequence of terms in C leads to 


|an|P lDn| e 1 p 1 p’ 
bal < Dla < Dl + J = Flleellee + elle 


n n P 


In particular, for unit vectors u = x/||x||pr, v = y/|yll pp’, we get Hélder’s inequal- 
ity, 
x: 1 1 
Ix - YI < 7 
IIXllerl lew =P P 


Proposition 9.14 


For p > 1, £? isaseparable Banach space, with £?* = cP’, where ae = ils 


Proof Minkowski’s inequality is the non-trivial part in showing that ¢? is indeed a 
normed space. It is separable with the Schauder basis e,, since for any x = (ay) € £?, 
the series >", |an|? converges to IIx 7,5 so 


N P 00 
Ix — Do anenll =|10,...,0,avs1,.. 1% = >) lanl? > 0, 
n=0 ep n=N+1 


SOX = >), Qnén. The coefficients are unique since if x = >7,, bn€n = (bo, D1, .--), 
then by, = ay. 


Dual of €?: Any vector y € €?' acts on £? viax y - x, with the latter being 
finite by Hélder’s inequality | y - x] < |l yllpp’Ilxller. By Exercise 2 below, there is an 
x € £? which makes this an equality. Thus || y" || = || yllpp’- 

Conversely, let ¢ be a functional on €?; then dx = paar anbyn = y - x, where 
bn := bn, y = (bn). Writing by = [bn |e! and noting p(p’ — 1) = p’, 


N N N I/p 
Pal? = >) bne [dnl | = |e nl? "1 < Ill (>) 


n=0 n=0 n=0 


\ 1/p" 
Dividing the right-hand series gives (Smo |b, |? ) < ||@||; as N is arbitrary, 
ye er". 
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Completeness: In common with all dual spaces, €? = £?’"* is complete (or from 
an argument similar to the one for £7). Oo 


Exercises 9.15 
1. Given 1 < q < p, find an example of a sequence which is in £? but not in £7. 


2. For each y € ¢?", find a sequence x € €? which makes Holder’s inequality an 
equality. 


3. Generalized Hélder’s inequalities 


1 1 1 
IIx Yller < I+ ller l¥ller, where — + — = —, 
Pq ?f 


1 1 1 
| 3 anbnen| < [@n)ller ll Gn )lle ll (Cn ller, where ee ee ta. 
n 


r 


(Hint: Apply Holder’s inequality to the product |ay|"|by|".) 
4. Littlewood’s inequality: \|x |e < \|x|I%pl|x lq“. where i 9 lea 


(Hint: Apply the generalized Hélder’s inequality above to Jay |“|an|'~%, using 
p/a and q/(1 — @) instead of p and q.) 


5. * Young’s inequality: 


Ile * Yller < IX ller ly lle, 


where 4 al a ~»P,q = | (Exercises 9.7(2,3)). 
aa the sop of the following proof. First note that J rag + = = | (where 


4+ = 1-14, etc.); then using the second generalized Holder’ s sweansitee above 
on the positive numbers a,,, by, Cy, and an exquisite juggling of indices, (where 
k:=n—m) 


N on N on 
> » An—mbmCn = > Say?!” Gm" any?! (en 4 bm)!” (cn)? 
n=0 m=0 n=0 m=0 
»\ 1/q' \ 1/p! 

< (Dates)! ‘(SX aher’) (Soe) 

n,k nm 

N N 
1/p 1/q 1/r' 

(Sat) "(Soa)" 


n=0 n=0 n=0 
Hence if (cn) € €", (an) € €?, and (by) € £2, then (an) * (bn) € (0")* = €". 


a 


6. * Prove the reverse Minkowski inequality forO0 < p < 1, and positive real 
sequences x = (ay;,), Y = (bn), An, by = 0,7 
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Ix Ilp + yp < lle + Yllp- 


(Hint: the reverse inequality has its roots in x? being concave.) 


9.2 Function Spaces 


Much of the above can be generalized from sequences to functions, where summation 
>, &n becomes integration | f(t) dt. For example, the proof that 2° is complete 
generalizes to the space L° (IR), practically untouched. Even though it is function 
spaces that are at the heart of “functional analysis”, we do not prove all these gen- 
eralizations here, as laying the groundwork for integration and measures would take 
us too far afield. Instead a review is provided, referring the reader to [6] for more 
details. However, we allow for vector-valued functions, because it does not incur any 
extra difficulty. To avoid confusion with the scalar || ||, we write | f| for the function 


xt || FOI) 


Lebesgue Measure on RY 


1. A measure jz on R% is an assignment of positive numbers or 00 to certain 
subsets E C R with the properties that it be 


(i) additive, uw(EU F) = uw(E)+ “(F) for E, F, disjoint, 

(ii) continuous, E, > E => w(En) > WE). 
We haven’t defined a distance function on sets, but it is enough for now to 
take E, — E to mean that E), is a decreasing sequence of sets of finite 
measure, with (),, En = E. 
One final property that we expect yj to satisfy, at least in R, is 


(iii) a translated copy of a set has the same measure, w(E + x) = W(E). 


Examples of measures are the standard length, area, and volume of Euclidean 
geometry. 


2. Taking R as our main example, and defining w[0, 1[ := 1, these properties 
completely determine the length of any interval, namely u[a,b] = b-—a = 
Lula, b[. (Aint: divide [0, 1[ into equal intervals to show [0, m/n] = m/n.) 


3. As a first step in constructing yz on R, therefore, the length of any interval is 
defined to be the difference of its endpoints, e.g. m[a, b] := b—a. This function 
can be extended in two ways to 


(a) the length of any countable union of disjoint intervals 


m(\J In) = >) mUn), 


(b) the length of the set obtained by removing a countable union of disjoint 
subintervals from a bounded interval 


9.2 Function Spaces 155 


m(L\ (J In) =m) — >) m(h). 


n n 


4. For general sets, define 


m*(A) := inf{m(U): ACU =|JIh} 
m,(A) := sup{m(K): ADK =1\|()h} 


n 


(Note that since we are taking the infimum and supremum, respectively, we might 
as well take J to be a closed and bounded interval and J, to be open intervals, 
in which case U is an open set, and K a compact set.) 

It is a fact that there exist sets for which these two values do not agree (see [6]). 
A “well-behaved” set, called measurable, satisfies m*(E) = m,(£), which is 
then called its Lebesgue measure (E). 


5. m*(U,, An) < >), m*(An) and A C B => m*(A) < m*(B) (since open covers 
for each A, provide an open cover for their union). Of course, these statements 
continue to hold for jz applied to measurable sets. 


6. A useful equivalent criterion of measurability of E is: 


For any set A, m*(EM A) + m*(E°M A) = m*(A). 


7. Using this criterion, it follows that, for E, F', and E, measurable sets, 
(a) E°, EUF, ENF, E\ F,and EAF are measurable; when they are disjoint, 
M(EU F) = w(E)+ WF). 
(b) UP.) En and (\°2, Ey are measurable, and when E,, are disjoint, 


(LJ En) = >) (En). 
n=1 


n=1 


The sets that can be obtained by starting with the intervals and applying these 
constructions are called Borel sets; they include the open and closed sets. 


8. Sets with (m*-)measure 0 are obviously measurable and are called null sets. For 
example, any countable set is null; but most null sets are uncountable, e.g. the 
Cantor set. The countable union of null sets is null. 

Adding (or removing) a null set N from a measurable set E does not affect its 
measure, 


W(EUN) = (EE) + WN) = WE). 


Because measures don’t distinguish sets up to a null set, we say that two sets 
are equal almost everywhere, E = Fa.e., when they differ by a null set. More 
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generally, we qualify a statement “Pa.e.” when P(t) is true for all f except on 
a null set; for example, we say f = g a.e. when f(t) = g(t) for all ¢ in their 
domain apart from a null set. 


9. The distance between measurable sets is defined as d(E, F) := u(EAF). It is 
a metric, with the proviso that d(E, F) = 0 & E = Faz.e. The measure yu is 
continuous with respect to it, E, ~ E > u(E,) > (Ee). 


10. A similar procedure gives the Lebesgue measure on R¥, with the modification 
that cuboids are used instead of intervals to generate the measurable sets. Most 
subsets of R that the reader is likely to have encountered are measurable, 
including the unit sphere and ball in R?. 


Measurable Functions 


1. The characteristic function of a set is defined by | g(t) := : r . . Linear 
combinations of characteristic functions ee 1 1£,Xn, where E, are bounded 
measurable subsets of R and x, € C, are called simple functions (or step func- 
tions). More generally, R can be replaced by a fixed measurable set A, and x, 


can belong to a Banach space X. The simple functions form a vector space S. 


2. A function f: A — X is said to be measurable when it is the pointwise limit 
of simple functions, s, — /f a.e. For real-valued functions, this is equivalent to 
f—'[a, oof being measurable for all a € R. 

Note that simple functions supported in E (i.e., are zero outside E’) can converge 
only to measurable functions supported in E (since s, lz — flga.e.). 


3. Measurable functions form a vector space: Aff and f + g are measurable when 
f, gare. It follows from ||sn|—| || < [sn — f| that | f| : A > R is measurable. 
For real-valued measurable functions, fg, max(f, g), and sup, (fn), are also 
measurable. Real-valued continuous functions are measurable. 


4. » In fact the space of measurable functions is in a sense complete: if f, are 
measurable and f, > /f a.e., then f is measurable. 


5. L®(A) is defined as the space of (equivalence classes of) bounded measur- 
able functions f: A — C, over a measurable set A, with the supremum 
norm || f|lzoo := supjae.|f(f)|, ie., the smallest real number c such that 
|f@ |< ca.e.t 


6. L©(IR) contains the closed subspace of bounded continuous functions C;(R), 
which in turn contains Co(R) := { f € C(R): lim f(t) = 0}. The space 
t—>=x00 


C[a, b] is embedded in Co(R). 


7. L©™[a, b] is not separable: the uncountable number of characteristic functions 
I[x,y],@ <x < y <b, are at unit distance from each other. 


9.2 Function Spaces 157 


Proposition 9.16 


L(A) is a Banach space. 


Proof Tf | f(t)| < || f llc except on the null set E;, and |g(t)| < |lg|| p00 except on 
the null set £2, then for all tf € A \ (E) U Ep), 


IfO+9O!l <IFOl+l9OI, AFM] =lAlFOl, 
so || f + gllzc% < Ifill + Iiglize, AF Ile = IAM FIle. 


Clearly || f ||, = 0 only when | f(t)| = Oa.e. It follows that L°(A) is a normed 
space, as long as we identify ae-equal functions into equivalence classes. 


Completeness: Let fy, € L°°(A) be a Cauchy sequence, where | fi, (t)| < || fill z-0 
for all t € A except in some null set E,,. Copying the proof of the completeness of 
£© (Theorem 9.1), 


lfn(t) — fm(O)| < fn — fli > 0 


for each t € A, except possibly on the null set U,, En, so f,(t) is Cauchy and 
converges f,(t) — f(t)a.e.(t). The function f is evidently measurable, and 
Stn — f uniformly away from this null set, since for any € > O and n large enough 
(but independent of r), 


Ifn) — FO] < lfrO — fn Ol + lfm — $I 
< Il fn — fllte +l fmt) — FO a.e.(t) 


< 2€ 
where m > n is chosen, depending on f, to make | fin(t) — f(t)| < €. This means 
that f, + f in L®, and implies || fllz° < If — fallzs + lfallz < 00, so 
f € L&(A). Oo 
Integrable Functions 


1. Given a set E of finite measure and its characteristic function, let ‘i le := (EB). 
For a simple function, define its integral 


N N 
Dy lz, Xn = Dae Coa 


n=l n=1 
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It is well-defined, since a simple function has a unique representation in terms 
of disjoint E,. It is straightforward to verify that f(s +r) = fs + fr and 
fas=Afsfors,reS. 


. The function ||s|| := f Is] = >¢,, H(En) Xn || is anorm on S. Here, |5| is the real- 


valued simple function |s| := >°,, Lz, ||xn|| > 0. In particular, for real-valued 
simple functions, r<s > fr< fs. 


Proof (i) ls + rl] = Din H(En) len + Yall < Don HCEn) lonll + lyn ID = Isl] + 
Ir, 

(ii) [As] = 30, H(En) lAxnll = Ills, 

(iii) y |s| = O when >), “(En)||Xn|| = 0. This implies (£,)||xn || = 0 for all 
n,1.€.,X, =O OR (E,) = 0,s0 5 = Oa.e.. 


. The integral is a continuous functional on S, || ri S|] < f |s|, since, 


| f sh =(eGoel < Deeb = fs 


. The space of real (or complex) simple functions with this norm is separable (the 


simple functions with x, € Q and E,, equal to intervals with rational endpoints, 
are countable and dense), but not complete. 


. A Cauchy sequence of simple functions converges a.e. to a measurable function. 


Proof Let s, be a Cauchy sequence in S. Given any € > 0, let 
Ey :={t €R:4dn,mZN, |lsn(t) — Sn (t)|| 2 €}, 


-enEn) = [ eley < [155ml = sn ~snll > 08 N > 00. 


This shows that for ¢ not in the null set Fe := an En, 
AN, Wn,m2QN, |lsn(t) — Sn (t)|| < €. 


In particular, for ¢ not in the null set Yen Fi/k: In (t) — Sm(t)|| > Oasn, m > 
oo. Thus, except for a null set, (s,(t)) is a Cauchy sequence in X and hence 
converges. 


A function f: R — X is said to be integrable when it is the ae-limit of a Cauchy 
sequence of simple functions s, — f a.e. Its integral is given by the extension 
of the integral on S, 


[fe lim | Sp. 
n—-> oo 
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Note that ['s, is a Cauchy sequence in X (|| [sn — f smlly < J 182 —Sm| > 0). 
The space of (equivalence classes of) integrable functions R > X is denoted 
by L!(R, X); it is the completion of S (Theorem 4.6). By Proposition 7.17, the 
space L!(IR, X) is a normed vector space with 


Illes = im tall = tim, fst = f 1 


so f € L'(R, X) © |f| € L'(R). Italso follows that for real-valued integrable 
functions f<g > [f< fg. 


7. The integral is a continuous functional on Li, X) (Example 8.9(4)), 


fetoafrefn faraafn i fsis fis 


If fr > f in L1(R, X) then f f, > f f in X. 


8. (a) fELl'R) => f f@xdt=(f f)x, 
(b) Te B(X,Y) > JTf=T {I f. 


Proof (a) is a special case of (b) with T : F > X, T(A) := Ax. 
As an operator T : X > Y acts linearly on a simple function s=>°,, lz,%n € S, 


N N 
Tes > le Pe => [t= eGo =t fs 
n=1 


n=1 
Ifs, > f in L'(R, X) then Ts, > Tf inL'(R,Y), so [Tf =T f f. 


9. For a measurable set A C R, define L'(A) := { fl4 : f € L'(R)}, and let 


ee —o ere 
Note that ii f =0 for any null set A. Hence if f = ga.e., with g € L'(R), and 
E = Fa.., then f € L'(R) as welland J, f = [7-9. 


10. For E, F disjoint measurable sets, 


ke 


It follows thatE CF => fp \fl< frlfl. 


Theorem 9.17 


L'(A) is a separable Banach space. 
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Henri Lebesgue (1875-1941) graduated at the Ecole Normale 
Supérieure of Paris at 27 years. His thesis built upon work 
of Baire, Borel and Jordan, to generalize lengths and areas, 
and so an integration powerful enough to tackle functions too 
discontinuous for Riemann’s integration — the first complete 
space of integrable functions. After a century of attempts by 
other mathematicians, he finally proved that uniformly bounded 
series of integrable functions, such as the Fourier series, could be 
integrated term by term. Although his achievement was widely 
seen as abstract, in his words, “Reduced to general theories, 
mathematics would be a beautiful form without content. It would 
quickly die.” 


s 
att, an. 
Fig. 9.2 Lebesgue 


Proof Completeness: Let f, be a Cauchy sequence in L'(A), ie., || fn — fm|| > 0. 
Choose s, € S close to fn, say ||5, — fn|| < 1/n. Then (s,) is a Cauchy sequence 
of simple functions, asymptotic to f,. By Notes 5 and 7 above, s,, converges to an 
integrable function f in L'(A). Hence, so does the asymptotic sequence fy. 


Separability: By construction, the separable set S of simple functions is dense 
in L'(A): Any f € L!(A) has a sequence of simple functions converging to it 
(Sn) > f a.e., so || f — snl|z1 > Oasn > ov. oO 


Much the same analysis can be made starting with the norm ||s|| , := ( f |s|? ) '/P ; 
1 < p < ow, on S. The completion of S in this norm is denoted by L? (A), which is 
thus complete and separable (S dense in it). 


Proposition 9.18 


If f, > f in L~(R), that is, uniformly, and 
(i) fy are continuous, then f is continuous, 


(ii) fp are integrable, then f is integrable on [a, b], and 


[aofs 


ie / . . in / 
(iii) f, are continuous and converge uniformly, then f/f; — /’. 


Proof (i) The first assertion is a restatement of the fact that C (IR) is closed in L° (R) 
(Theorem 6.23). 


(ii) The second follows from the completeness of ‘By [a, b] and the continuity of the 
integral 
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b 
lfn- fl <O@-AI| fn — fllz~fa,p) > 9- 


(iii) If f; > g uniformly, then f/ > g in L![a, 1] by (ii), and f" fi > J" g. But, 
assuming the fundamental theorem of calculus (Theorem 12.8), rs : f, = frlt) - 
Fn(a), which converge to f(t) — f(a) uniformly and in L'[a, t]. So oe g=ft)- 
f(a), showing f is differentiable, with f’ = g. 

Oo 


Examples 9.19 


1. Convergence in L! (R) is quite different from uniform convergence. For example, 
the sequence of functions 1 10,1] converge uniformly to 0, but not in L'), 
whereas the sequence 11/9 1 /,21 converges to 0 in L ‘(R) but not uniformly. 


2. The product x - y of sequences becomes f - g := | fg for functions. Hélder’s 
inequalities are valid: 


1g ol 
flo: SWF llcellglipe L= 5+ > 
I Peyeet 
lfgller <WFfllzellglins, ; = ptr 
1 1 
flier < FUSS” r=gt>: 


thus f lies in L?(A) for p in an interval of values. 


Proof Integrating |a(t)b(t)| < fore + mar and putting a = f/||fllpp. b = 
9/\igll__o’. gives the first inequality. Substituting | f|" for f, |g|’ for g, p/r for 
p, and q/r for p’ gives the second inequality. Finally, the substitutions of | f|% 
for f, | f|!~% for g, and p/a for p, gives the third. 


3. » When the domain of the functions is compact, the spaces are included in each 
other as sets, in the reverse order of the sequence spaces, 


Cla, b] C L™[a, b] € L7[a, b] € L'[a, b], 


because, by Hélder’s inequality, J : L°[a, b] > L?[a, b] > L'[a, b] is con- 
tinuous, 


IF llztpa,b) S < (6-4)? I fllaan < (b — a)|| fll L~[a,n)- 


4. The notation ie f is capable of at least three interpretations, as (i) Jr f when 
f € L'®,, (ii) limr.s+soo [, f, (ii) limrsoo [8p f. It should be clear that 
the finiteness of these integrals follow (i) = (ii) = (iii), but the examples 


tee x dx = 0 and 1. sm dx —> m/2.as R > oo show that the converses are 
false. 
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Approximation of Functions 


Proposition 9.20 
The polynomials are dense in L![a, b], L*[a, b] and C[a, b]. 


Proof By construction, the step functions are dense in L! (R). Now, intuitively speak- 
ing, any real-valued step function s can be “nudged” into a continuous function g 
by replacing its discontinuities with steep slopes, and the distance ||s — g|| 1 can be 
made as small as needed by making the slopes steeper. More precisely and more 
generally, any bounded measurable set E in R lies between a compact set K and an 
open set U, such that jz(U \ K) < €; also, there is a continuous function gz taking 
values in [0, 1] such that gg K = 1, gg US = 0 (Exercise 3.12(17)). So 


Ve >0, dgz € C(R), Ilge — lel =a lge— lel <w(U\ K) <e. 
U\K 


Consequently, taking any non-zero simple function s = ey 1¢,Xn and replacing 
each 1, with continuous functions gn, where ||gn — lz, l|,1 < €/ paul 1 Xn||, gives 
a continuous function g := yy GnXn, Which approximates s in L!, 


N 
lls — gli < Do We, — gnllzillanll <¢. 


n=1 


But any function f ¢ L'(R) has a simple function approximation s, which in turn 
can be approximated by a continuous function g. Combining these two facts gives 


If — glint Sf — slut + Ils — gilli < 2¢ 


showing that the set of (integrable) continuous functions is dense in L'(R). Note 
further that precisely the same arguments work for L(R). 

We have already seen, in the Stone-Weierstra8 theorem (Theorem 6.24), that the 
set of polynomials p(z, Z) is dense in C[a, b]. But, in this case, z = z = x € [a, b], 
so such polynomials are of the usual form p € C[x]. Combining this with the above 
result shows that C[x] is also dense in L'[a, b] and L7[a, b]: for any € > 0, there is 
a polynomial p € C[x] such that 


If — Plletta.o) < WF — alletta.o) + Ilo — Pliztta.p) < 3¢€ 
since ||g — Pllztta,o1 < ( - Dllg — Plictae < €- Oo 


More generally, the polynomial splines are dense in the real version of these 
spaces. A spline of degree N is a function >°,, 1x, Pn, where E,, are disjoint intervals 
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and p, are polynomials of degree at most N such that the first N — 1 derivatives 
match at the endpoints of E,. They are often used in numerical techniques and 
graphics computing. Another useful way of approximating integrable functions uses 
the convolution: 


Proposition 9.21 Approximations of the Identity 


If h, € C(R) are such that h, > 0, fh, = 1 and h, — 0 uniformly on 
R \ [—6, 6] for any 6 > 0, then h, * f > f in C(R) and L![a, dD}. 


Proof Let g be a continuous function, and let x € R; on the one hand, 


Ve > 0, 35>0, ly| <8 > lgxt+y)—g(Qx)| <€, (9.6) 


tn IAN and on the other hand, for this 4, 
[ \ 


——_ 


- TA nN 


JN, n>NAND|y| >5 > O<hn(y) <e€. ~~ (9.7) 


Therefore, for all x andn > N, 


Ian # 968) — 9691 = | f hao) (aGx = ») ~ g(x) a 
< f in(y)|g0 = ») = 969 | dy 
5 
<f inoredy+ | 2eUglledy by (9.6)ana (2.7 
-8 R\[-6,6] 
<eU + 2llglic) 
and ||hn * g — gllc — 0 as required. 
Infacth,*f approximates f € L'[a, b]inthe L!-norm, for, choosing g € C[a, b] 


close to f, || f — gllztta,o) < €, and n large enough that ||’n*g—gllc < € 
holds, then 


An * f — flln'ta,o] < ||hn * 9 — Ie fa,b] + |[tn * (fF — Dllr[a,p1 
+ IF - gllztta.p) < 3¢, 


since ||Mn * (f — llnt < llAallollf — glo. Oo 
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Corollary 9.22 


IIf@ + y¥) — F@lzi~@) > Oasy > 0 


Proof The step functions hy := n1[—1/2n,1/2n] Clearly form an approximation of the 
identity, and soh, * f > f in L!. But their translations by Ty f(x) := f(x —y), 
namely yy := Tijanhn = n1o,1/n) and hy := T-1/2nhn = n1,-1/n,0], form other 


approximations of the identity. Since (7,h) « f = h x (Ty f), 


1 
[lfet 3 feo|de = Wtaryanf ~ Fl 


< Te1jnf _ (T+1/2nhn) * filly + l(T41/2nhn) * f > filly 
= lTs1jonf — ha * (Taipan fli + lay * f — fla 


— Oasn > oo. 


Exercises 9.23 
1. The map (an) RH >, dnl fn.n+1p embeds £! into L'(R). 


2. If Din Il fnllz: exists, then Do) J fo = f naa fn 


3. The map L!(A) > C, fb J gf is linear, and continuous when g € L™(A). 
Assuming surjectivity, show L'!(K)* = L®(K) for K C R compact, and simi- 
larly L?(K)* = L? (K)(p > 1). 


4. Show that the functional 6,(f) := f(a) on C[a, b] does not correspond to any 
L!-function 5 in the sense of ba(f) = f of. Hence the dual space of C[a, b] is 
not L![a, b]; it consists of functionals called measures of bounded variation. 


5. Minkowski’s inequality: Emulate the proof of Proposition 9.12 to show 
If + glice < Wf llce + Igiize (Pp 2 J). 
6. * L!(R) has a convolution defined on it, 


froOi= / fea maa 


Just like the same-named operation in g! , it is associative and commutative; but 
it has no identity, although Gibbs and Dirac audaciously added one and called it 
5. Young’s inequality is satisfied, 


1 1 
If * gller < Wize ligize, a a =i 
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7. Matched Filter: An electronic filter is a circuit acting on a signal f € L7(R) 
and outputting the convolution g * f (g € L'(R)). Signals often have white 
noise n(t), where ||g * y||;2 = €|lgl|z2. The signal-to-noise ratio is S/N := 
llg * fileo/ig * nllz03 show that S/N < If llZ2/e?, with equality holding when 
g(x —t) =Af(t),rAER. 


The Fourier Series 


We end this chapter with a look at one of the most important operators on L'[0, 1]. 
Back to the days of Fourier, there arose the question of whether every periodic 
function f can be built up as a Fourier series ~h dy, cosnx + b, sinnx. This claim 
of Fourier was disputed by Lagrange and others; Dirichlet obtained a partial result 
for the case f € C?, and Riemann later vastly extended this result. Despite these 
protests, the use of Fourier series grew, mainly because they actually worked in many 
examples. 


Definition 9.24 


The Fourier coefficients of an integrable function f € L'[0, 1] are the 
sequence of numbers defined by 


1 
Ff (n) = f(n) =/ eo Fade, | eZ. 
0 


This section cannot do justice to the immense number of results and applications 
of Fourier series. It must suffice here to present a couple of main results, with the 
aim of generalizing them later on. Refer to [7] for more details. 


Theorem 9.25 


F : L'[0, 1] + co(Z) is a 1-1 continuous operator with 


IF lleco < WF lletp0,14 


Here, co(Z) is defined as consisting of those ‘sequences’ (an )nez such thata, > 0 
asn +> OO. 


Proof That F is linear is easy to show. It is continuous because 


1 1 
|x = sup [ etm pods] < | If@ldx = Iflla0.1- 


neZ 
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The characteristic function 1 fq), for [a, b] € [0, 1], has Fourier coefficients 


e72tina _ e72ninb 


b 
l[a,p] (7) =a e2TiNk dy — - > Oasn > too. 
a 2min 


Hence the vector space of simple functions, as well as its closure L![0, 1], are mapped 
into the complete space co (Exercises 8.10(8)). 


F is 1-1: If fin) = 0 for every n, then 


1 
| e 27'Ny F(y)dy =0, Wne Z. 
0 
The aim is to show that f = 0 ae. Firstly, 
1 . 1 . 
| ent’ F(x — y) dy = | eri") F(y) dy = 0. 
0 0 


Secondly, since (cos ay)?” = (e277!) + e~?7!Y 4 2)" /2?” is a linear combination 
of exponentials of various frequencies that are all multiples of 277y, we have, for 
hn(y) := (cos my)" /cn, 


1 1 
hn * f@) = — | (cos xy)" f(x — y) dy = 0, 


1 2n—1)(2n—3)--- 
where Cy = fo (cosy)" dy = EMP > r. 


The functions h,, satisfy the criteria of Proposition 9.21, as they are positive and 
fall rapidly to 0 for |y| > 5, as n — oo. Thus || f||;1 = |l4n * f — fill — 0, and 
f =O0ae. Oo 


The Fourier coefficients have properties that appear remarkable: when f is trans- 
lated the coefficients rotate in C, with each f(n) performing n turns as f is translated 
one whole period; differentiation of f scales the coefficients by a multiple of n; and 
convolutions are transformed to multiplications. 


Proposition 9.26 


For periodic functions, with period 1, 


Taf(n) =e" Fn), f'(n)=2ninf(n), fxg =f9. 


9.2 Function Spaces 167 


Proof A translation Ty f (x) := f(x — a) has the effect 


1 
FFtn) = | e2RIME F(x @) dx 
0 


1 
_ | gO ea) dx = ea, 


0 
Differentiation, f’(x) = limy;+o sae gives 
= a7 =7 e2tinh _ 4 
Frm = fin (EP = fim, FF) = ain Fen, 


and the convolution of f and g becomes 


1 fl 
fro [ en 2rinx fe — oly) dy a 
i sl 
= 7 | e 27iM+Y) F(x) dx g(y) dy 
0 0 


1 1 
~ I ee FO) ax [ eT!" g(y) dy = f (n)gin). 


Exercises 9.27 


1. Show 


(b) Fixe £¢..,-3,-L F,15,...,4,..0, 


(©) F:|lx-—5Z1 3C..,0,1, 5, 1,0, 5,0, 35...) 


d) F:x@—5)@-I bw FRC... -37,-1,0,1, §,...,5,...). 
2. Using the open mapping theorem, show that F is not onto co. 


3. The power spectrum of a function is a plot of |f(n)/? (often with n varying 

continuously in R). It displays the dominant frequencies of f. A better plot is 
the Nyquist diagram, where fin) is graphed in three dimensions, with one axis 
representing n, and the other two representing f= | fle! ? 
Prove that F : C*[0, 1] > cx(Z), where C*[0, 1] is the space of k-times contin- 
uously differentiable periodic functions, and cx(Z) := { (an) cez, : nka, > 0 }. 
Therefore, how fast the power spectrum decays as n —> oo measures how smooth 
the function is. 
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. The operator Sy : f(x) bP ai/? f (ax) (a > 0) stretches or compresses f, while 


preserving its L?-norm; prove S,f(n) = S) Ja fin). This should be familiar: 
playing a sound clip in half its normal time doubles the frequencies. 


. Recall the analogous Fourier transform on L!(R), Ff) ‘= q eee Pid. 


Similarly to the Fourier series, 

(a) it is a continuous linear operator F : L'(IR) > Co(R), 

(b) 1f-a.a(€) = a sin(wag)/(rag) =: a sinc(aé) > 0 as > £00, 
(c) Taf@) =e 24 FE), 

(d) f'(E) = 2mié fl), 


(ce) fxg=f9G. 


2 fed 252 : : 
ie zee /o" = \/ge-™°'5", Deduce that the convolution of two Gaussian 


functions is another Gaussian function, 


2 2 2 2 OT pe ee 2 
e* /20 ket [20° _ In e* /2(o7 +t ) 


Voz4+ 12 


Notice how there is a trade-off between the ‘width’ o of the original Gaussian 
and that of its Fourier transform, namely |/o. 


: Wiener-Khinchin theorem: For f € L'(R), define f*(x) := f(—x). Show 


fF = f, and the auto-correlation function fix f@= } fa ft + x) dt 
is transformed to the power spectrum | f (€)|?. More generally, f* * g is called 
the cross-correlation function of f and g. 


Remarks 9.28 


1; 


. We often make remarks like “the dual space of co is ¢ 


The functionals on © are more difficult to describe. Every sequence y € ¢! still 
acts as a functional on 2° viax +> y-x, but £°* is a complicated non-separable 
space that includes much more than just £! (look up “finitely additive measures” 
for more). 


'__this is not literally true 


because a functional on cg is not a sequence, but the application of one, i.e., it is 
y' not y. But the two are mathematically the same object in different clothing, 
and functionals on co do behave like the sequences in £!. 


. £© = C,(N), so the completeness part of Theorem 9.1 is included in Theo- 
rem 6.23. 
. The Fibonacci iteration dy, := dn—1 + dy—2, Starting from ag = | = ay, is an 


equation on sequences. It can be written as 
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x= Rx+Rx +e, +e 
or as (€9 —e€; —@2)*x =engt+e, 
(1,—-1,-1,0,...)*x =(1,1,0,...) 


when expressed in terms of convolutions. Convoluting with the inverse of 
(1, —1, —1,0,...) gives the terms of the Fibonacci sequence (but note that the 
inverse is not in £'). Traditionally, “generating functions” are used to get the same 
results, the connection being elucidated in Chap. 14. 


5. €! contains the space of sequences he = {(an) : (N’an) € £!}, (s > 0), which 


; : 56 
in turn contains ¢° be: 


6. One can show that as p > ©0, ||x||¢p > ||x||go0, if x belongs to some £7. 


7. The following are some classical criteria for determining that a sequence of mea- 
surable functions f;, that converges pointwise a.e. is Cauchy in L(A), 


(a) | fn| are increasing but | | f,| are bounded (Monotone Convergence Theo- 
rem), 


(b) |fnl<g€ L!(A) (Dominated Convergence Theorem), 


(c) iT, ge Jn converges for all measurable sets E (Vitali’s theorem). 


8. A function has both local and global integrability properties: locally about x € R, 
it may belong to some L?[x — 6, x + 5] space, while globally, the sequence of 
numbers ay := || f || Ltn,n+1] May belong to 7. For example, / is in L'(R) when 
it is locally in L! and globally in ¢!. L? 


loc are spaces of functions that are only 
locally in L?. 


9. The Fourier series maps F : L?[0,1] — 0?’ for 1 < p < 2 (see Exer- 
cise 10.35(14) for p = 2). 


Chapter 10 
Hilbert Spaces 


10.1 Inner Products 


There are spaces, such as ¢*, whose norms have special properties because they are 
induced from what are termed inner products. Not only do such spaces have a concept 
of length but also of orthogonality between vectors. 


Definition 10.1 


Aninner product ona vector space X is a positive-definite sesquilinear form,! 
namely a map 


(,):XxXxX—7>F 
such that for all x, y,z E X,A EF, 


(x, y + Z) = (x, y) + (x, 2), (x, Ay) = A(x, y), 
(Wn 28) = Cee) (x, x) > 0; (Ge, a2) 0) <S> oF = (0), 


Easy Consequences 

1. If for all x € X, (x, y) =0, then y = 0. 

2. (x + y,z) = (x, z) + (y, z), but (Ax, y) = A(x, y) (conjugate-linear). 

3. (x, x) is real (and positive); its square-root is denoted by ||x|| := /(x, x). 
4. ||Ax|] = |Al |x|], and ||x|| =O = x =0. 


! In the mathematical literature, the inner product is often taken to be linear in the first variable; 
this is a matter of convention. The choice adopted here is that of the “physics” community; it 
makes many formulas, such as the definition x*(y) := (x, y), more natural and conforming 
with function notation. 
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5. |lx + yll? = [lx]? +2 Re (x, y) + [lyll?. 
6. (Pythagoras) If (x, y) = 0 then ||x + yl]? = |]x||? + Ily|l?. 
More generally, if (x;,x;) = 0 fori # j then (by induction) 


llr tee bx ll? = lal? +--+ flew ?. 


We will see next that the triangle inequality is also true, making || - || a norm, thus 
inner product spaces are normed spaces. Two vectors are orthogonal or perpendicular 
when (x, y) = 0, also written as x 1 y. More generally, two subsets are said to be 
orthogonal, A L B, when any two vectors,a € A,b € B, are orthogonal, (a, b) = 0. 


Examples 10.2 


1. The simplest examples are the Euclidean spaces RN and C% with 


a, by bj N 
(2 isl: D=@-ay) |: |= diaibe. 
an bn bn i 
More generally, take any basis v;,..., vy of F’, expand any two vectors x and 


yasx= yy Ann, Y= yy bnVy, and define (x, y) := pea Gnby. (The 
inner product differs depending on the choice of the basis.) 


2. The matrices of size M x N have an inner product given by 


MN | 
(A, B):= >) >) Ajj Bij. 


i=l j=1 
3. p> C7 has the inner product ((a,), (bn)) := pear a,b,. The fact that this series 


converges follows from Cauchy’s inequality | >", @nbal < |\(an) || ](@n) Il- 


4. » L*(A) has the inner product (f, g) := fi fg. That this integral has a finite 
value follows from Holder’s inequality | f, fgl < Ifglla: < Il fllzziigllz2- 


5. The weighted ¢* and L* spaces generalize these formulae to 


((an), (bn)) = Sy GaP lies (fg) = | FHaewe dx 


n 


respectively, where w, and w(x) are called weights; what properties do they 
need to have for the inner product axioms to hold? 


Our first proposition generalizes Cauchy’s inequality (Proposition 7.4) from ¢7 to 
a general inner product space. It is probably the most used inequality in analysis. 
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Proposition 10.3 Cauchy-Schwarz inequality 


I(x, y)] < |lalilly 


Proof The inequality need only be shown for y non-zero. Any other vector x can 
be decomposed uniquely into two parts, one in the direction of y, and the other 
perpendicular to it: 


x 


x=rAy+(x—Ay), with (x — Ay, y) =0. 


This yields AX = (y, x)/(y, y). Applying Pythago- 
ras’ theorem, we deduce that 


llx{]? = Ay? + Ix — Ayl*, 


hence ||Ay|] < [|x|], or |A] < |< |I/Ilyl|, from which 
follows the assertion. oO 


Corollary 10.4 
lx + yll < Ixll + Ilyll 


Proof Using the Cauchy-Schwarz inequality, Re (x, y) < |(x, y)| < ||xllllyll, so 


IIx + yl? = [xl]? +2 Re (x, y) +Ilyll? < lec? +2 Ix Illly ll + ly? = Uell + ily. 


oO 
Hence || - || is anorm, and all the facts about normed spaces apply to inner product 
spaces. For example, the norm is continuous. 
Proposition 10.5 
The inner product is continuous. 
Proof Let x, — x and y, — y, then since y, are bounded (Example 4.3(5)), 
(Xn, Yn) — (x,y) S [Xn — X, Yn) | + 1%, Yn — Y)I 
< []Xn — x(lllyall + Wally — yl 
—> 0. oO 


It follows that taking limits commutes with the inner product: 


lim (Xn, Yn) = ( lim xp, lim yy). 
noo noo noo 
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Fig. 
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David Hilbert (1862-1943) studied invariant theory under Lin- 
demann at Konigsberg until 1885. His encyclopedic powers 
motivated him to explore much of mathematics; in 1899, in 
Gottingen, he gave rigorous axioms for Euclidean geometry; 
1904-9, he studied Fredholm’s integral equations, with his stu- 
dent Schmidt; he defined compact operators, proving they are 
limits of matrices, with their spectrum of eigenvalues; (Schmidt) 
defined ¢? with its inner product. On to mathematical physics, 
quite possibly he inspired Einstein’s general relativity. His 1918 
‘formalist’ research programme set out to prove that set axioms 
are consistent, “one can solve any problem by pure thought” . 


10.1 Hilbert 


Definition 10.6 


A Hilbert space is an inner product space which is complete as a metric space. 


In the rest of the text, the letter H denotes a Hilbert space. 


Examples 10.7 


1. 
2. 


RY, CY, ¢? and L?(R) are all Hilbert spaces (Theorem 8.22, 9.8). 

Every inner product space can be completed to a Hilbert space. In the completion 

as a normed space (Proposition 7.17), take (x, y) := lim (xy, yn), for repre- 
noo 


sentative Cauchy sequences x = [x], y = [yn]. Note that (x,, yn) is a Cauchy 
sequence in C since 


(Xn» Yn) — (Xm Yn)| + |(X%ms Yn) — ms Yn) | 
IXn — Xml Il¥nll + [lm lyn — Yml| > O 


[(Xns Yn) — (Xms Ym)| < | 
< | 


asn,m — ov, with ||xml, ||yn|| bounded. 
> For an inner product space over C, if (x, Tx) = 0 for all x € X, then T = 0. 


Proof The identities 


O= (x+y, T@ + y)) = (x, Ty) + (y, Tx), 
O= (x tiy, T@ + iy)) = i(x, Ty) — iy, Tx), 


together imply (x, Ty) = 0, for any x, y € X, in particular ||Ty||* = 0. 


An alternative proof of the Cauchy-Schwarz inequality is 


0 < |lu — Av||* = 1 — 2Re Alu, v) + |Al? 
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for u := x/||x||, v := y/|lyl] unit vectors and all \ € F, in particular for 
A= |(u, v)|/(u, v). 


5. ||x|| = sup |(x, y)|, with the maximum achieved when y = x/||x||. 
Ilyll=1 


Doall norms on vector spaces come from inner products, and if not, which property 
characterizes inner product spaces? The answer is given by 


Proposition 10.8 Parallelogram law 


A norm is induced from an inner product if, and only if, it satisfies, for all 
vectors x, y, 


IIx + yll? + [lx — yl? = 2Cxl? + Iyll?). 


y 


The statement asserts that the sum of the lengths \ U7 
squared of the diagonals of a parallelogram equals 
that of the sides. L-~ <& 


x 
Proof The parallelogram law follows from adding the identities, 


IIx + yll* = lll? + 2Re (x, y) + lly ll’, 
lx — yll* = lll? — 2Re (x, y) + lly. 


Subtracting the two gives 4 Re (x, y). This is already sufficient to identify the inner 


product when the scalar field is R. Over C, notice that Im(x, y) = — Re i(x, y) = 
Re (ix, y), so 
1 2 2 . + 2 ‘i ee GD 
(x,y) = 3 (lly +21? — Ny — 21? + lly + ix? — ally —ixI?). 0.1) 


This remarkable polarization identity expresses the inner product purely in terms of 
norms. Accordingly, for the converse of the proposition, define 


for any normed space, ((x, y)) := z(lly + x||? —|ly- x7) 
for a complex space, (x, y) = (x, y)) +iix, y)). 


Two of the inner product axioms follow from (y,x)) = (x, y)) and (x,x) = 
(x, x)) = ||x||?, as well as (x,0) = (x, 0)) = 0; (y, x) = (x, y) is readily veri- 
fied using 


Atiy, x)) = |lx + iyll? — lx — iyl? = lly — ixll? — ly +ixl? = —4 ix, yy. 
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Showing that linearity holds when the parallelogram law is satisfied is the hardest 
part of the proof. Writing 


2ytx=(yt+ztx)+0—-2), 
2ztx=(yt+ztx)—-—(Q-2), 


and using the parallelogram law, 


A(x, 2y)) + 4((x, 22) = |]2y + xl]? = ]2y — xl]? + [122 + xII? = [122 — x |)? 
= ||2y +x]? + []2z + x||? — [2y — x|I? = [122 — x)? 
= 2\ly +2 +x? + 2lly — zll? — lly +z — xl)? — 2lly — zl? 
= 8((x, y+ z)). 


In particular, putting z = 0 gives (x, 2y)) = 2((x, y)), reducing the above identity to 
(x, y +z) = (x, y)) + (x, Z). (10.2) 


By induction, it follows that (x, ny)) = n((x, y)) forn € N. For the negative integers, 


(x, —y)) = l-y + x1? - -y — x1? = —(, y) 
while for rational numbers p = m/n,m,n € Z,n #0, 
n(x, “y)) = (x, my) = m((x, y) 


so (x, py)) = p(x, y)). Note that (x, y)) is continuous in x and y since the norm is 
continuous, so if the rational numbers p, > a € R, then 


(x, ay) = lim (x, Pay) = lim, pn (lx, y)) = ax, y). 


This completes the proof when the scalar field is IR. Over the complex numbers, 
(x, Ay) = A(x, y) for A € C is evident from (10.1), (10.2), and 


(x, ty) = —(ix, y) +1 (x, y) = (x, y). o 


In a sense, it is the presence of orthogonality that distinguishes inner product 
spaces from normed ones. By the polarization identity, two vectors are perpendicular 
when ||x + y|| = ||x — y|l| and ||x + iy|| = ||x — iy||. Each vector, and more gener- 
ally each subspace, is complemented by a subspace of those vectors that are perpen- 
dicular to it. 
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Proposition 10.9 Properties of Orthogonal Spaces 


The orthogonal spaces of subsets A C X, 
At :={x eX: (x,a)=0,Va€ A}, 


satisfy 
G) AA = 0; 
(jee R 2 GA and Ac A 


(iii) A~- is a closed subspace of X, 


(iv) At =[AT. 


Proof (i) If a vector a € A is also in At, then it is orthogonal to all vectors in A, 
including itself, (a,a) = 0, soa = 0. 


(ii) Ifa € A C Bandx e€ B+, then (x,a) = 0, so x € At. For anya € A and 
x € At, (a,x) = (x,a) =0,soae Att. 


(iii) If x and y are in At anda € A, then 


(Ax, a) = A(x,a)=0, (x+y,a) = (x,a) + (y,a) =0, 


so Ax,x + y € At. If x, € At and x, > x, then 0 = (xp,a) > (x,a), and 
xe An, 
(iv) That [Al Cc At follows from A C [A]. Conversely, let x € A+; for any 
a,beA, 


(x,a+b) = (x,a)+(x,b) =0, (x, Aa) = A(x, a) = 0, 


so x is orthogonal to the space generated by A, x € [[A]]+. Let a, — y with 
ay, € [A], then 0 = (x, a,) — (x, y) andx € [A]. oO 


Exercises 10.10 


1. If T, S : X — Y are linear maps on inner product spaces such that (y, Tx) = 
(y, Sx) for all x € X,y € Y, then T = S. Example 10.7(3) is false for real 
spaces: Find a non-zero 2 x 2 real matrix T such that (x, Tx) = 0 for all 
x €R?, 


2. The Cauchy-Schwarz inequality becomes an equality if, and only if, x = Ay 
for some scalar A (or y = 0). Similarly, ||x + y|] = ||x|| + |ly|| precisely when 
x = Ay, A = 0. More generally, || >°,, Xn || = >¢,, ||xnll if, and only if, x» = Anx 
for some \,, > 0. 
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. For any x,y € A, find scalars |\| = 1 = |p| such that |[x||? + lly? < 


Ax + pyl|?. 


. A vector space may have various inner products. When T : X —> X is 1-1 


and linear, (x, y)) := (Tx, Ty) is another legitimate inner product on X. What 
properties does S need to have to ensure that (x, y)) := (x, Sy) is also an inner 
product? 


. * Every inner product on RN is of the type (x, Ay) = Di A;j;a;b; where A is 


a positive symmetric matrix. Deduce that balls have the shape of an ellipse in 
R2, and of an ellipsoid in R?. 


. > The product of two inner product spaces, X x Y, has an inner product defined 


by 
(Gt) , ch = (x1,X2)x + (1, y2)¥- 


Then the maps x b> (5) and y > ) embed X and Y as orthogonal subspaces 
of X x Y. Although the induced norm is not the same one we defined for X x Y 


as normed spaces (Example 7.3(8)), the two norms are equivalent. 


When X,Y are complete, so is X x Y with the induced norm (note that 


lll < |G). 


. In any inner product space, 


(a) |lx — yl? + x ty — 2z||? = 2Ilx — zi? + 2lly — zl’. 
(b) le ty + all? + Ix ty = al? + lle — y all? + Mle — ya? 
= A(|lx|lF + |ly ll? + IIzil*). 


. Verify that the norms for €7 and L?(R) satisfy the parallelogram law, and show 


that the inner product obtained from the polarization identity is the same one 
defined previously (Examples 10.2(3, 4)). 


. The 1-norm and oo-norm defined on R? do not come from inner products. Find 


two vectors that do not satisfy the parallelogram law. 


> Similarly, é!, e©, L'(R) and L™(R) are not inner product spaces. Neither is 
B(X, Y) in general. 


A norm ||-|| that satisfies the parallelogram law gives rise to its associated inner 
product, by the polarization identity. In turn, this inner product induces the norm 
|x|] -= V(x, x). Show that the two norms are identical. 


The polynomials x and 2x? — | are orthogonal in L[0, 1]. So are sine and cosine 
in the space L?[—7, 7]; can you find a function orthogonal to both? 


0+ = x, X' =0.In fact, At = X = A C {0}. Do you think it is true that 
At =0 & A= X? Whatif A is a closed linear subspace of X? 
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14. Show that (a) (A + B)+ = At B+, (b) A+++ = A+. (Hint: use property (ii) 
of the proposition.) 


15. Let d := d(x, [Ly]) = inf) ||x + Ay||, where y is a unit vector; show that (a) 
d = ||x + Aoy|| for some Xo, (b) | (x, y)|? = ||x|]? — d?, and (c) y L (x — Aoy). 


16. To illustrate the strength of orthogonality, prove that if M L N are orthogonal 
complete subspaces of X, then M + N is also complete (Example 7.11(2)). 


17. Suppose a vector space X satisfies all the axioms for an inner product space 
except that it contains non-zero vectors with (x, x) = 0. Show that if (x, x) = 0, 
then Vy, (x, y) = 0 (Hint: expand || y — Ax|I?). 

Deduce that Pythagoras’ theorem and Cauchy-Schwarz’s inequality remain 
valid. Show that Z := {x : (x,x) = 0} is a closed linear subspace, and that 
there is a well-defined inner product on X/Z, (x + Z, y+ Z) := (x, y). 


18. A light ‘ray’ has a frequency profile f(w). Oversimplifying slightly, our eyes 
convert it to a color vector ((r, f), (g, f), (b, f)) where r(w), g(w), b(w) are 
the absorption profiles of the retinal cones. So any two points (rays) in the coset 
ft+ing, by have the same color. 


10.2 Least Squares Approximation 


By Exercise 10.10(15) above, the distance between a point and a line can be min- 
imized by a unique point on the line. This has a generalization with far-reaching 
consequences: 


Theorem 10.11 


If M is a closed convex subset of a Hilbert space H, then any point in H 
has a unique point in M which is closest to it, 


Vx € H, Aly, Ee M, Vy eM, ||x — yxll < |lx — yll- 


Proof Let d := d(x, M) = infyey ||x — y|| be the smallest distance from M to x. 
Then there is a sequence of vectors y, € M such that ||x — y,|| — d. Now, using 
the parallelogram law and the convexity of M, (y,) is a Cauchy sequence, 


ll¥n — Ymll* = lyn — xl? + lym — x11? — On + ym) — 2x11 


+ 
= 2190 — x1? + 2lhyim — x11? — 4] ==" — xf? 


< lyn — xl? + 2llym — x1I? — 4a? 


>0, asn,m>o. 
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But M is closed (hence complete) and so y, > y, € M. It follows, by continuity of 


the norm, that ||x — y,|| = lim ||x — y,|| =d. 
noo 
Suppose a € M is another closest point to x, ie., ||x — a|| = d. Then y, = a 
since 


lye — all? = Zllye — «ll? + 2lla — x1]? — Oe + a) — 2x1) 
< 2lly« — x1]? + 2lla — x|/? — 4a? 
= 0. oO 


Let us concentrate on the special case when M is a closed subspace of H. 


Theorem 10.12 


When UM is a closed linear subspace of a Hilbert space H, then y € M is 
the closest point y, to x ¢ H if, and only if, 


Payee, 


The map P : x + y, is a continuous ‘orthogonal’ projection with im P = 
M orthogonal to ker P = M+, so 


H=Me@M-. 


Proof (i) Let a be any non-zero point of M and let 

b := x—(ysx+Aa) where Ais chosen sothata _L b, that 

is, \ := (a,x — yx) /lall?. By Pythagoras’ theorem, b 
we get 


lx — yall? = 1b + Aall? = [1517 + [Aall? > [el? 
da Xo 
making y,.+ Aa even closer to x than the closest point 
y,, unless AX = 0, Le., (a,x — y,) = 0. Since a is 
arbitrary, this gives x — y, L M. 
Conversely, if (x — y) L a’ for any a’ € M, then (x — y) | (a’ — y) and 
Pythagoras’ theorem implies 


hye o Ky 
lx —a | = lx - yl +lly-a@ ll, 
so that ||x — y|| < ||x —a’||, making y the closest point in M to x. 


(ii) By the above, for any x € H, P(x) is that unique vector in M such that x— P(x) € 
M-. This characteristic property has the following consequences: 
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e P is linear since 
(a+ y)—(Px+ Py) = (@— Px) +(y— Py)eM™, Pxt+ PyeM, 


hence P(x + y) = Px + Py. Similarly, P(Ax) = APx. 


e The closest point in M toa € M isa itself, i.e., Pa = a, soim P = M. Since 
Px € M, it also follows that P?x = Px, and P? = P. 


e When x € M+, then x —0 € M+ andO € M so Px = 0. As Px = 0 implies 
x=x—PxeMe, this justifies ker P = Mt. 


e P is continuous since ||x||* = ||x — Px||? + || Px|l? by Pythagoras’ theorem so 
that || Px|| < lx]. 


Finally H = im P @ kerP = M @ M¢, since any vector can be decomposed as 
x = Px+(x — Px),andMNM+t =0. Oo 


Corollary 10.13 
For any subset AC H, At+ =[A]. 


Proof Let M be a closed linear subspace of a Hilbert space H. By Proposition 10.9, 
M C M+-, so we require the opposite inclusion. Let x ¢ M++, then x =a+b 
where a € M andb € M+, and 


0 = (b, x) = (b, a) + (b, b) = |oI)’, 


forcing b = 0 and x € M; thus Mt+ Cc M.In particular, Att = [Al =[A].o 


Note that Mt =0 © M=M1t!=0'=4H, answering Exercise 10.10(13) in 
the case of a closed linear subspace of a Hilbert space. 


Examples 10.14 
1. LettM:={fe L*(0, 1): fo f = 0}. To find that function fo in M which most 
closely approximates a given function g, we first note 
M={feL7[0,:(,f)=0}={1F, som’ =[ID. 
Then fo must satisfy fo € M and g— fo € Mt, ie., fo = g +. and 
1 1 1 
0= (5 fo= [po +A, hence fo = 9 — [5 9: 


2. The “affine” projection onto a plane with equation x -n = d (n a unit vector) is 
given by P(x) :=x+(d—x-n)n. 
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Proof Translate all points x +> y := x — dn, so that the plane becomes the 
subspace M with equation y-n = 0, i.e, M = {n}+. The required point 
satisfies (y— yo): y = O forall y ¢ M, so yp = y+ an. Dotting with n implies 
a= —y-n=d-—x-n, whichcan be substituted into xy» = x + an. 


3. A projection is orthogonal if, and only if, || P|| = 1 (unless P = 0). 
Proof Using (x — Px, Px) = 0 and the Cauchy-Schwarz inequality, 


|| Pxll? = (x, Px) < [lel Pll, 


so ||Px|| < |lx|]; but Px = x for x € imP, so ||P|| = 1. Conversely, let 
a € kerP, b € im P; then for any 4, 


[IbI = ||PQa + d)I? < Aa + BI? = IAP llall? + 2Re Alb, a) + [Bll 
and after letting \ = |Ale’? with |A| > 0, we find Re e’?(b, a) > 0 for any 0, 
hence (b, a) = 0. 
4. » [[A]] is dense in AH if, and only if, At =0. 


Proof If At = oy then [A] = A1+ = 0+ = H. Conversely, if A is dense in H, 
then AL = [A] =H! =0. 


Application: Least Squares Approximation 


A common problem in mathematical applications is to approximate a generic vector 
x by one which is more easily handled, such as a linear combination of simpler 


vectors y,,..., yy. For Hilbert spaces, there is a guarantee that a unique closest 
approximation exists, and this lies at the heart of the method of least squares. 
Let M := [[y1,.-.-, yn], a closed linear subspace of H; then the closest point 


in M tox is y, = bas ajy; such that x — y, lL M. Since M is generated by 
Y1,---, YN; this is equivalent to 


(Yi, X — ye) =O, G=1,...,N, 


N 
(viex) = (i Ye) = Do OF yao 
j=l 
These N linear equations in the N unknowns a, ..., ay, canbe recast in matrix form, 
(yi, ¥1) +++ (Yt, yn) ay (y1, Xx) 
(yn, Yi) --- (YN, YN) Qn (yn,x) 


Given x, the coefficients a; can be found by solving these equations. The Gram 
matrix [(y;, y;)], and possibly its inverse, need only be calculated once, and used to 
approximate other points. 
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Example The space of cubic polynomials, a+bx +cx* +dx?, is a four-dimensional 
closed linear subspace of the Hilbert space L7[0, 1], with basis 1, x, x’, x>, Their 
Gram matrix and inverse are given by 


-1 


1 1/2 1/3 1/4 16 —120 240 —140 


1/2 1/3 1/41/5) _ | —120 1200 —2700 1680 
1/3 1/41/51/6} ~ | 240 —2700 6480 —4200 
1/4.1/5 1/6 1/7 —140 1680 —4200 2800 


So, to approximate the sine function by a cubic polynomial over the region [0, 1], we 
first calculate (x!, sin ) 1210, 1]> which work out to (0.460, 0.301, 0.223, 0.177), and 
then apply the inverse of the Gram matrix to it, giving 


p(x) © —0.000253 + 1.005x — 0.0191x? — 0.144x°?. 


Notice that the coefficients are close to, but not the same, as the first terms of 
the MacLaurin expansion of sine. The difference is that, whereas the MacLau- 
rin expansion is accurate at 0 and becomes progressively worse away from it, 
the L?-approximation balances out the ‘root-mean-square error’ throughout the 
region [0, 1]. 


Exercises 10.15 


1. Find the closest point in the plane 2x + y — 3z = 0 to a point x € R?. (Hint: 
Find Mt.) 
2. Let (a) M := [Ly], or (b) M :={y }1, where y is a unit vector. The orthogonal 


projection P which maps any point x to its closest pointin M is (a) Px = (y,x)y, 
(b) Px = x — (y, x)y. 


3. » Inthe decomposition x = a+b witha € M andb € M+, a and bare unique. 
Deduce that if H = M @ N, where M is a closed linear subspace and M L N, 
then N = Mt. 


4. Let a + M be a coset of a closed linear subspace M. Show that there is a 
unique vector x € a+ M with smallest norm. (Hint: this is equivalent to finding 
the closest vector in M to —a.) Deduce that Riesz’s lemma (Proposition 8.20) 
continues to hold in a Hilbert space even when c = 1. 


5. If M C N are both closed linear subspaces, then M ® (MIO N)=N. 
6. Let T beasquare matrix, and suppose both subspaces M and M+ are T-invariant, 


so that T takes the schematic form (3 a Show that || 7 || = max(||A]l, || Bll). 


(Hint: take x = a + b, then || 7 x||? = ||Ta||* + || T||?.) 
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. » There is a 1-1 correspondence between closed linear subspaces of a Hilbert 


space and orthogonal projections (onto them). Properties about subspaces are 
reflected as properties of the projections, e.g. if the orthogonal projections Piy 
and Py project onto M and N respectively, then 

(a) MON & Pu Py = Pu = PnPo, 

(bt) MLN & PyPy =0= Py Pu, 

(c) N=Mt © I= Py + Py, 

(d) M is T-invariant © T Py = PyT Py, 

(e) Mand M + are both T-invariant & T Py = PuT, 

(a) Since (x, a) = (Px, a) for any point a in a closed linear subspace M, it fol- 


lows that |(x, a)| < ||Px||||a|| with equality whena ¢€ [[Px]]. Deduce that in 
areal Hilbert space, the angle between x and a is at least cos! (|| Px|| /|| ||). 

(b) Let H = M @N with M, N non-zero closed subspaces. Show that there is 
a minimum distance d > 0 between the disjoint closed sets By ™ M and 
By CN; thus for any unit vectors x ¢€ M, y € N, |lx—yl| > d > 0. 
Deduce that Re (x, y) <a:=1-—- a 72, and hence that 


Vx eM, VyeN, — |{x,y)| <allxllilyll. 


The main theorem, which does not refer to inner products, is not true in Banach 
spaces in general. 


(a) In R? with the 1-norm, the vector (a) has many closest vectors in the closed 
ball B,[0]. 

(b) In €%, there are many sequences in co that have the minimum distance to 
(Ly Teves): 

(c) Show that, in a normed space, the set of best approximations in a convex set 
M to a point x is convex. 

(d) * On the other hand, in €°, the sequence 0 has no closest sequence in the 
closed convex set M := { (an) € co: >), An/2” = 1}. 


* Consider two orthogonal projections P and Q in R¥. Show that the iteration 
Yn+1 ‘= OPy,, starting from yy = x converges to a point x, € im P Nim Q. 


Find 


(a) the best-fitting quadratic and cubic polynomials to the sine function in 
[0, 27], 


(b) the linear combination of sin and cos which is closest to 1 — x? in L?[0, 1]. 
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12. (a) 


(b) 


(c) 


The Gram matrix of vectors y;,..., Yy is G := A*A where the columns 
of A are y,, and the rows of A* are y,,. It is invertible when y,, are linearly 
independent. 


Show that in order to write a vector x as a linear combination of basis vectors 
r= Sy QnY,, given the numbers b, := (y,, Xx), then one needs to solve 
the matrix equation Ga = b. 


Given the total mass and moment of inertia of a radially symmetric planar 
object, 


R 
veo ; p(y) dr = Qtr, p(*)) 1270, ep 
R 
I= 2n f p(r)r? dr = 2n(r?, P(T)) 1210, R}> 
0 


find an estimate of p(r) as some function a + (r. 


13. The symmetric Gram matrix of a set of vectors x, € RY is useful in other 
contexts as well. Show how to recover 


(a) 


(b) 


(c) 


the vectors x, from their Gram matrix, up to an isomorphism (use diago- 
nalization to find A such that A* = G), 


the Gram matrix of the vectors from the mutual distances between vectors 
dij, and their norms r;, 


the Gram matrix from dj; only, assuming aoe X, = 0. 


This is essentially what is done in the Global Positioning System, when 3 to 4 
distances obtained by time-lags from satellites are converted to a position. 


10.3 Duality H* + H 


An inner product is a function acting on two variables. But if one input vector is 
fixed, it becomes a scalar-valued function on vectors, indeed a continuous functional 


x*:X oF 
yt> (x, y). 


This is linear by the inner product axioms, while continuity follows from the Cauchy- 
Schwarz inequality |x*y| = |(x, y)| < |lxllllyll- 
Are there any other functionals besides these? Not when the space is complete: 
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Theorem 10.16 Riesz representation theorem 


Every continuous functional of a Hilbert space H is of the form x* := (x, ), 
Vee H*, xe H, g=({x, ). 
The Riesz map 


ae 


Pee 


is a bijective conjugate-linear isometry. 


Proof (i) Given @ € H%, first notice that for any z and y in H, 


(dy)z — (z)y € ker ¢. 


Assuming ¢ ¥ 0, pick a unit vector z L ker ¢; this is possible since ker 6 £ H, so 
(ker ¢)+ 4 0. Then 


0 = (z, (oy)z — (bz) y) = (by) — (bz){z, y), 
Py = (P2){Z, y) = (x, y), 


where x = (#z)z. To show that it is unique, suppose < is another such x, then 
VyeH, (x—Xx,y) = (x,y) — (%,y) = dy — by =0 & x =X. 


(ii) Part (i) proves that J is onto and 1-1. Let x and y be two vectors in H. Then for 
any z € H, 


(x + y)*(z) = (x+y, 2) = (x, 2) + (y, 2) = x*z4+ yz, 
Os) a= Oe =A) Shee 


showing that (x + y)* = x* + y* and (Ax)* = Ax* (conjugate-linear). 
To see that J is isometric, note that 


kyl _ We 


I|x* ll 7 = sup = 
y#0 Iyll = yzo IIy hh 


= |lxll, 


using the Cauchy-Schwarz inequality, in particular with y = x. Oo 
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Frigyes Riesz (1880-1956) was a Hungarian mathematics profes- 
sor who proved that L?(IR) is complete; in 1907, with E.S. Fis- 
cher, he proved that Hilbert’s @ space is equivalent to L?(R); he 
defined compact operators abstractly for more general spaces, 
including C/a, 6] (1918); he introduced the resolvent projection 
to part of the spectrum and thus f(T’) for compact operators. 


Fig. 10.2 Riesz 


Examples 10.17 


1. 


The dual space of R is (isomorphic to) R itself. Any ¢ : R — R that is linear 
must be of type @(t) = At where \ € R. 


Functionals are simply row vectors when H = C%; thus H* is isometric to CY 
and is generated by the dual basis e|, ..., €y- 


Proof Let e, ..., ex, be the standard basis for C’. Then every functional ¢ in 
(C)* is of the type ¢ = (bn)", where by, := ¢e,, Example 8.4(3). Thus the 
map C% — (C%)*, yt y", where y'x := y-x, is onto; it is easily seen to 
be linear, and continuous from Cauchy’s inequality |y - x| < ||y||||x|]. In fact 
ly" || = llyll (using x = (b,)). Note that y™ = pe bne,, and €) €m = Onm- 


It was noted previously that €7* = ¢7 and L?(R)* = L?(R) (Exercise 9.23(3)). 
These are special cases of the Riesz correspondence. 


Exercises 10.18 


1. 


> For T € B(X, Y) (X, Y Hilbert spaces), 


xl] = sup l{y,x)], ITI = sup |{y, Tx)]. 
lyl=I I =1=IyI 


Show that the norm of H* comes from the inner product (x*, y*) yx := (y, x) q.- 


A functional ¢ € H* corresponds to some vector x € H; if M is a closed linear 
subspace of H, ¢ can be restricted to act on it, 6 € M*. As M is a Hilbert space 
in its own right, what vector a € M corresponds to ? 


. A second inner product on H which satisfies |((x, y))| < cel]x||||y|] must be of 


the type (x, y)) = (Tx, y) = (x, Ty), where T € B(A), ||T || <c. 


Riesz’s theorem holds only for complete inner product spaces (it is false for, say, 
coo C £7), Where is completeness used in the proof of the theorem? 
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10.4 The Adjoint Map T* 


We now seek to find a generalization of the transpose operation on matrices. In 
finite dimensions, we have (A*v)* = v* A; in terms of inner products, this becomes 
(A*v, x) = (v, Ax). In this form, it can be generalized to any Hilbert space: 


Definition 10.19 


The (Hilbert) adjoint of an operator T : X — Y between Hilbert spaces, is 
the operator T* : Y — X uniquely defined by the relation 


(T*y,x)y =(y,Tx)y VWxeX,yeyY. 


That T*y is uniquely defined follows from the Riesz correspondence applied to 
the functional x +> (y, Tx). Linearity and continuity of T* follow from 


(T*(y1 + y2),%) = (yi + yo, Tx) = (y1, Tx) + (yo, Tx) = (T*y1 + T* yo, x) 
(P* Qy),.2) = Ay, Tx) = Aly, Tx) = OT*y, x) 
IT* = sup [T*y,x)|/= sup [y, Tx)| = ITI 
lly l=1=|I-1| lly ll=1=Il-1] 


The properties of the adjoint map are: 


Proposition 10.20 


(ST) =] 8) eT Om On 8) or) ai Ss 
PSG eS Ne el ine 


Proof These assertions follow from the following identities, valid for all x € X, 
yer: 


((S+T)*y, x) = (y, (S+ T)x) = (y, Sx) + (y, Tx) = ((S* + T*)y, x) 
((AT)*y, x) = (y, AT x) = A(T *y, x) = (AT*y, x) 
((ST)*y, x) = (y, STx) = (S*y, Tx) = (T*S*y, x) 
(I*y,x) = ed lee (y, x) 
Ps) =" 


|7*T|| = sup [(y, T*Tx)| = sup |(Ty, Tx)| 


x,yeS x,yeS 


sup ||TyllII7xll = ITIP, 


X,yE 
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where S := {x : ||x|| = 1}, and the equation before the last is valid by the Cauchy- 
Schwarz inequality, in particular choosing y = x. Oo 


The following proposition reveals an orthogonality between subspaces of adjoint 
operators. In particular, both M and M+ are T-invariant if, and only if, M is T- and 
T*-invariant. 


Proposition 10.21 


For an operator 7 on Hilbert spaces, 
kerT* = (imT)+, imT* = (kerT)1. 
If T € B(A) and M is a closed linear subspace of H, 


M is T-invariant = M¢+ is T*-invariant. 


Proof The definition (T*x, y) = (x, Ty) implies that 
xlTy = T*xly, 


in particular x L imT = T*x LY © x € kerT*. Consequently, kerT* = 
(im T)+ and thus kerT = ker7** = (im T*)+; furthermore, 


(kerT)+ = (im T*)*+ = imT*. 

Suppose M is T-invariant, and let x €¢ M+, y € M, then (T*x, y) = (x, Ty) = 0, 
and T*x € M+. Conversely, if M+ is T*-invariant then M++ is T**-invariant; but 
T** = T and M+ = M for aclosed subspace M. Oo 
Unitary Operators 
Definition 10.22 

A unitaryisomorphism J : X — Y of inner product spaces is defined as a 

map which preserves all the structure of an inner product space, namely 

J is bijective (preservation of the elements), 


J is linear (preservation of vector addition and scalar multiplication), and 
(Jx, Jy)y = (x, y)x (preservation of the inner product). 
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It is obvious that a unitary isomorphism preserves the induced norm (an isometry); 
the converse is also partly true in Hilbert spaces, because, by the polarization identity, 
the inner product can be written in terms of norms: 


Proposition 10.23 


An operator U € B(X, Y) on Hilbert spaces preserves the inner product 
when U preserves the norm, 


Wie de EX, (Wx, US) = 2) & OU = Il 
> ||Ux|| = |x|] Vx € X. 


U is unitary when it is also onto. 


This statement is basically saying that preserving the inner product (lengths and 
‘angles’) is equivalent to preserving lengths. 


Proof The first equivalence is trivial 
Vx, X, (x, X) = (Ux, UX) = (x, U*UX) & UXU =I. 


In particular (taking x = x), U is isometric. The converse implication from the third 
statement to the first follows from the polarization identity (10.1), 


1 1 
(Ux, Uy) = 2 Ux + Uyll +--+) = alle + yll + +++) = &, y). 


A superficially different proof of this last fact can be given for complex Hilbert 
spaces (Example 10.7(3)), 


Vx, (x, x) = (Ux, Ux) = (x, U*Ux) & U*U =I. 


Since isometries are 1-1, we need only require in addition that it is onto for U to be 
invertible, in which case U~! = U*. oO 


Examples 10.24 


1. The adjoint of a matrix A = [Aj;;] is the conjugate of its transpose, A’, since 


(x, Ay) = 28 ijyj= 2 diya = (At x,y). 


2. » The adjoint of the left-shift operator (on ¢7) is the right-shift, L* = R, since 
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lero) 00 
(L* y, x) =(y, Lx) = nant = bn 14n = (Ry, x) 
n=0 n=1 
and R* = L** = L, 


3. The adjoint of an integral operator on L?(R), 


Tfy) = fl KAN F@dy ie T*9G) = / EG, not) dy. 


Prof  @,Tf\= 7 90) i k(x, y) f(x) dx dy 
= / / k(x, yg) f (x) dy dx 
. / / EG yoty) dy fx) de = (T"g, f). 


4. The unitary” isomorphisms of R? are the rotations and reflections. More gen- 
erally, those of R” are the matrices whose columns are orthonormal (mutually 
orthogonal and of unit norm). 


Proof The column vectors u; of a unitary matrix U satisfy uj = Ue;, where e; 
are the standard basis for R”. Then, (u;, u;) = (Ue;, Vej) = (e;, ej) = bij. 


5. » By itself, U*U = J ensures that a linear operator U : X — Y is isometric 
(and 1-1), but not that it is onto, that is, it is an isometric embedding of X into 
01 
Y. For example, the matrix { 1 0 | embeds R? into R?. In general, UU* is not 
00 
equal to J but is a projection of Y ontoim U CY. 


Proof Clearly, VUU*UU* = UU*% is a projection from Y to im U. It is onto 
since UU*(Ux) = Ux. 


10.5 Inverse Problems 


When an operator T : X — Y is not onto, the equation 7x = y need not have a 
solution. The next best thing to ask for is a vector x which minimizes ||Tx — yl]. 


2 More properly called orthogonal isomorphisms when the space is real. 
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Proposition 10.25 


For an operator T : H,; — Hp between Hilbert spaces and a vector 
y € Hy, a vector x € H; minimizes ||T x — y|| if, and only if 


Ps = Ty. 


Proof Suppose T € B(X, Y), and consider the closed linear subspace M := im T C 
Y.Foreach y € Y, there is a unique vector y,, € M which is closest to it. As proved in 
Theorem 10.12, a necessary and sufficient condition for y, is y— y, € M+ = kerT* 
(Proposition 10.21), that is, T*y, = T* y. 

If y, happens to be in im 7, i.e., y, = Tx, then the equation becomes T*Tx = 
T* y; this can only occur when y € im T @ (imT)+, a dense subspace of ¥. When 
im T is closed, e.g. in finite dimensions, this is the case for all y € Y. 

If y, ¢ im T then we can only conclude that there is some sequence of vectors 
Xn € X such that Tx, — yx, and so T*T x, — T*y. Thus ||T x, — y|| converges to 
llyx — y||, but is never equal to it (by uniqueness of y,.). oO 


To continue this discussion, the above situation in the case of finite dimensions 
is typical of an overdetermined system of equations, that is, a system Tx = b that 
represents more equations than there are unknowns. The least squares solution is 
then found to be 

x = (T*T)'T*b 


at least in the generic case when T is 1-1. Then T*T is also 1-1 since T*Tx =0 > 
|x ||? = (x, T*Tx) =0 & x =0, 50 it is invertible at least on im 7*. 

The dual problem is that of an underdetermined system of equations, Tx = b, 
where there are less equations than unknowns. There is an oversupply of solutions, 
namely any vector in x9 + ker7, where xq is any single solution of the equation, and 
ker(T*T) = kerT 0. In this case, a unique x that is closest to 0 can be selected from 
all these solutions, 1.e., has the least norm. That is, we seek x € (kerT)+ = im 7™* (in 
finite dimensions, every subspace is closed). Thus x = T*y andb= Tx =TT*y, 
so the required least norm vector is 


x = 7T*(TT*)'b. 


In the general case, an operator need be neither 1-1 nor onto, so the set of vectors 
which minimize || 7x — y|| is acoset, x +kerT. But since kerT is a closed subspace, 
it has a unique vector with smallest norm. The mapping from y to this x € kerT+ 
is then well-defined for y € im T +im T+ and is denoted by T*, called the Moore- 
Penrose pseudo-inverse. To recap, 
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Tim? mt Cy = x, 
yt) x where T*Tx =T*y, x € kerT+. 


In the simple case when T is invertible, soim T = Y, it reduces to the usual inverse 
T' = (T*T)~'T* = T7!. For example, every m x n matrix and vector has a 
pseudo-inverse, e.g. x = x*/|x||?, so that x'x = 1 (except that 0' = 0). 


x+ker T lk 
D 
imT 
T 
> 
0 
Xx Y 


The equations introduced above have found an extremely fertile scope for appli- 
cations. In many scientific or engineering contexts, an abundant number of measure- 
ments of a few variables in general gives an overdetermined system of equations. 
This also occurs when there is loss of information during measurement, so that the 
“space of measurements’ (im 7) is a proper subspace of the space of variables (/7). 
A small sample of applications is given below: 


Regression 


To find the best-fitting (least-squares) line y = mx + c to N given points (3) € R’, 
minimizing the errors in y,, we require that mx, + c be collectively as close to yy 
as possible. In matrix form, we require 


x; 1 YI 


xy 1 YN 


written as Am = BD. As this usually has no exact solution, the best alternative is 
*Am = A*b, 


x; 1 yl 


X1 °° XN e m\ [X11 °°: XN 7 
heads, oo Cc = Ll... 1 : , 
YN 


xy 1 
ELEN()-Gx) 


Solving form = () gives the usual regression line as used in statistics. 


This technique is not at all restricted to fitting straight lines. Suppose it is required 
Xn 


to approximate data points i) by a quadratic polynomial a + bx + cx?. This is the 
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same as trying to solve the matrix equation 


2 


1 x, x7 yy 
1 x2 x4 e y2 
b|= . 
: Cc 
l xy a YN 


Repeating the above procedure gives the solution 


a (S3-S1 $3) Dy, X2 Yn +(S1 $4—S2S3) Y, Xn Yn +(S$—S2S4) Do, Yn 
b | = = J (S0S3-S1 $2) ©, x2 yn H(S3—SoS4) Dp Xn Yn +(S1S4— S783) Hy Yn 
c (S? S052) Yon ¥2 Yn t+(S0S3—S1 $2) Y, Xn Yn (SZ —S1 83) Dp Yn 


where Sz, = oe and A = Ss: — 28, S283 + S?S4 — SoS2S4 + SoS; (Note: In 
practice, one does not need to program these formulae; multiplying out T*T as a 
numerical matrix and solving T* Tx = T*b directly is usually a better option.) 


Tikhonov Regularization 


The Moore-Penrose pseudo-inverse is usually either not a continuous operator or 
has a large condition number; its solutions tend to fluctuate with slight changes in 
the data (e.g. errors). To address this deficiency, a number of different regularization 
techniques are employed whose aim is to improve the ill-conditioning. One of the 
more popular techniques is attributed to Tikhonov; it balances out finding the best 
approximate solution of Tx = y with x having a small norm by seeking the minimum 
of ||Tx — y||? + al|x||?, where a > 0 is some pre-determined parameter. 

To solve this minimization problem, consider the following more general formu- 
lation: Let H be a real Hilbert space and suppose A € B(H), b € H, and c € R; to 
find the minimum of the quadratic function g : H > R, 


q(x) = (x, Ax) + (b,x) +c. 
Taking small variations of the minimum point x, namely x + tu, we deduce 


Vt € R, Vu € H, g(x) < q(x + tv) = (x + tv, Ax +tAv)+ (b,x +tv) +c 
“. O<tlv, Ax + A*x +b) +17 (v, Av), 
“ Wt>0, —t(v, Av) < (v, Ax + A*x +b) < t(v, Av). 


As ¢ and v are arbitrary, it must be the case that x satisfies 


(ALA%\« +b = 0: 
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In particular, minimizing ||Tx — y||* = (x, T* Tx) —2(T*y, x) + lly||? gives the 
equation inferred previously, T* Tx = T*y. Similarly, that x which minimizes 


[Tx — yl|? + alll? = (x, (*T + at)x) — 2(T*y, x) + Ilyll? 


solves the equation 
(T*T +al)x =T"*y. 


This is the regularized version of the last proposition. It will be proved later that 
T*T + al is always invertible (regular) for a > 0 (Proposition 15.42). This gives 
an excellent alternative to the Moore-Penrose solution when y ¢ imT + (imT)+, 
although choosing the parameter a may not be straightforward. 


Algebraic Reconstruction Technique 


ART is an iterative algorithm that generates a solution x of the (real) equation Ax = D. 
The matrix equation can be rewritten as (a,, x) = by,n = 1,..., N, wherea, are the 
rows of A. The iteration is defined in terms of affine projections (Example 10.14(2)) 


bn — (An, Xn—-1) 


an, xo € A. 
2 n 
|anl| 


Xn =Xn-1 + 


The indices of a, and b, are to be understood as modulo N (ay+1 = a}, etc). We 
show below that starting from any xo € H, the iteration converges to the closest 
point x, to xg that is a solution of Ax, = b. Note that starting from x9 = 0 results 
in the Moore-Penrose inverse. 

To see why this works, let M, := a (cycling through n = 1,..., N), then 
M := An M,, contains all the solutions of Av = 0; let also v, := x, — xy. The 
iteration becomes 


Vn = Vp—1| — (An, Vy—1)An = PnVn—1 © Mn, 


where @, = ay/||ay||, and P,, is the projection onto the hyperplane M,,. Notice that 
Vo = X09 —X»x € M+, as well as v, — v,»_1 € M+, so the entire sequence v,, lies in 
Mt. 

Consider the operator Q := Py --- P; acting on M+; its norm is bounded by 1 
because || Pj|| < 1 for each i. If 1 = ||Q|| = supy,,)—; ||Qw|l, then the supremum 
is achieved by some unit vector w € M+ since the unit ball is compact in finite 
dimensions and w +> ||Qw]| is a continuous function. Denote w; := P;w;-) = 
w;-1 — (@;, Wj_1)@;, with wo := w; then 


1 = ||Qw|| = ||Pywy-ill < llwn-ill < ||wy-all <--- < [will < wl] = 1 
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forces all w; to have norm 1. But, since || w;—1||* = || w; ||? + |(@;, wi—1)|7, it follows 
that (aj, wj-1) = O and w; = w;_| fori = 1,..., N. Hencew € M,N---NMy = 
M,yetwe M+ is a unit vector. 

This contradiction implies ||Qv|| < cllvl],c < 1, for any v € M+. Hence 
Mn+ ll = Qvnll < cllvnll; combined with ||vn+1|] < |]Mnl], we get vn > 0. 
Equivalently, x, converges to Xx. 

The advantages of ART are that it uses less computer memory and is flexible in 
that it can be used even if there is missing data or newly available data (missing or 
new rows of A); but, being an iterative procedure, it is generally slower to converge. 


Wiener Deconvolution 


When asignal f € L?(R) passes through a ‘circuit’ (which could be the atmosphere, 
say, Or a Measuring apparatus), it is modified in two ways: (i) the signal is distorted 
slightly to Kf := k * f, where k € L!(R) is characteristic of the circuit (recall 
convolution Example 8.6(5))), (ji) random noise in the process adds a little error 
« € L?(R) to the signal. The net effect is a distorted output signal y = k * f + €. Is 
it possible to extract the original signal f back again from y? A full reconstruction 
by solving Kf = y is impossible as lost information cannot be regained; the im K 
subspace is not the full space L?(R), and the error displaces the signal off this 
subspace. But one can use Tikhonov regularization and solve (K* K +a) f = K*y. 
The simplest way to do this is to use the properties of the Fourier transform, which 
converts convolution to multiplication. As in Example 10.24(3), the adjoint of K is 
given by K*g = k~ * g where k(t) := k(—1), since 


(Ka. f= (a KA) = ff GwKE—nFinaras = f [ KE=Dyls) 4s Ferrer. 
The Fourier transform of k~ is 

© = / eH at = / PED dt = FO), 
so that (K*K +a) f = K*y transforms to 


~  &kS 
ed) a 
ke +0 


This is a recipe for finding f from y, called deconvolution, that is commonly imple- 
mented as a computer program using the Fast Fourier Transform, or directly as an 
electrical filter circuit. 
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Fig. 10.3 Image reconstruction. (i) The original image, (ii) after it passes an imaging device (exag- 
gerated), (iii) the best-fit image 


Image Reconstruction 


An image can also be considered as a ‘signal’, this time in L?(R7), or, when dis- 
cretized, as a vector of numbers in the form of an array of pixels. Each number 
represents the brightness of a pixel (neglecting the color content for simplicity). An 
imaging apparatus transforms the original image x to y = Ax + e, where A is 
assumed to be a linear operator, as above; examples include a slight spherical aber- 
ration or blurring in general. Since such modification incurs a loss of information, 
the distortion matrix A is not invertible, but the best-fit “regularized” solution of 
x = (A*A+a/)~!A*y restores the image somewhat, as seen in Fig. 10.3. 

In practice, implementing the reconstruction encounters difficulties that are spe- 
cific to images. Images are typically in the order of about a million pixels in size; 
the matrix A would therefore consist of about a trillion coefficients (most of which 
are zero), and finding the inverse of A*A + al is prohibitively time-consuming. 
Fortunately, blurring is to a good approximation usually independent of the pixel 
positions; for example, a linear motion blur produces the same streaks everywhere 
across the picture (but note that this is not true for a rotation blur). In mathemati- 
cal terms, the transformation A can be taken to be translation invariant, so that it is 
equivalent to the convolution by some vector k € H. With this simplification, image 
reconstruction becomes a 2-dimensional version of Wiener deconvolution; the same 
technique using the Fourier transform can be applied, 


pay 
x= FN oe: 
IKI? + 
Here, y represents the discrete version of the Fourier transform, namely y,, = 
fe erm. The resulting x may have negative coefficients; these are mean- 


ingless and usually replaced by 0. 


Tomography 


Suppose that instead of a vector x, one is given ‘views’ of it, yy := (an, x), where 
a, is a list of known vectors: Is it possible to reconstruct x from these views? If 
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Fig. 10.4 Computed tomography. (i) The original image (360 x 360 pixels), (ii) 80 parallel ‘views’ 
of the object, (iii) the best-fit reconstruction from 6,400 views (80 directions) 


a, are assembled as rows of a matrix A, one obtains a matrix equation Ax = y. 
In such problems, it may be the case that the number of views is less than the 
dimension of the vector space, so that the system is under-determined, or that there 
are a large number of views, making the equation over-determined. In either case, a 
least-squares solution can be found as above, using the techniques of inverse problem 
solving (Fig. 10.4). 

CT scans: An x-ray passing through a 3-D object of density f diminishes in 
intensity by an amount el f(a+bt) dt where a + bt is the straight line followed by the 
ray. The emitted and received intensity can be measured and, after taking logs, one 
obtains a ‘view’ of the object 


y= f fa+onar ey Fh 


where Lg.» is the characteristic function of the ray, i.e., a function that is 1 along the 
ray and 0 outside it (in practice, the ray has a finite width). It should be possible to 
reconstruct f from a large number of these views. A CT-scan does precisely this: an 
X-ray source coupled with a detector rotate around the object to produce these views. 
In one simple configuration, b = eer anda =s{ sin’ ; the collection 
sin 0 cos 
of these views, as a function of @ ands, is called the Radon transform R of f. The 
best-fit f that reproduces the data is computed by solving (R*R+a) f = R*y, either 
directly in the form of the optimized Filtered Back Projection (FBP) algorithm or 
by iterative algorithms such as some variants of ART. Other configurations include 
a fixed source and a rotating detector, producing a fan-shaped collection of rays. 
In yet other applications, the ‘rays’ move along curved lines; more generally, the 
output may depend non-linearly on f and the source (see [21] for an overview of 
tomography and inverse scattering theory). 

The idea obviously has lots of potential: x-ray tomography has revolutionized 
medical diagnosis, archaeology, and fossil analysis; crystal x-ray diffraction 
tomography recreates the atomic configuration of molecules in a lattice; impedance 
tomography takes output currents from input voltages to reconstruct the interior 
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resistance density of an object; seismographs measure the output vibrations after the 
occurrence of earthquakes to reconstruct the interior density of the Earth; gravity, 
magnetic, or sound measurements at the Earth’s surface can determine rock densi- 
ties underneath, aiding in the exploration for oil or minerals; ultrasound echoes or 
scattered light can be used to reconstruct 3-D images of internal organs (or of moths 
and fish/squid by bats and dolphins). The list is long and increasing! 


Exercises 10.26 


i; 
2. Use || T*T'|| = || T||2 to show ||7*|| = ITI. 
3. 
4 


If T is invertible then (T~!)* = (T*)7!. 


> The adjoint of the multiplier operator in (7, x +> ax, is y > ay. 


. Leta € €'(Z), then Young’s inequality (Exercise 9.15(5)) shows that the linear 


map x +> a * x is continuous on £7(Z). Its adjoint is y +> a’ * y where 
(ay)" = (a_y). 


5. The Volterra operator on L?[0, 1], Vf (x) := Jo. f. has adjoint V* f(x) = f f. 


10. 


11. 


12. 


. Let (x, y)) := (x, Ay) be a new inner product, then the adjoint of T with respect 


to itis T* := (ATA7!)*. 


. If R € B(X, Y) then T +> RT R* is an operator B(X) > B(Y). 
. For any T € B(Aj, Ap), ker(T*T) = kerT andim T*T = imT™*. 


. Alinearmap T : X — Y is said to be conformal when it preserves orthogonality, 


Vx, x € X, (x,x) =0 @ (Tx, Tx) =0. 


Show that this is the case if, and only if, 7*T = AJ for some » > 0. Moreover, 
angles between vectors are preserved (for \ > 0). 

In particular, two inner products on the same vector space are conformal when 
(x, y)) = A(x, y) for some A > 0. 


* Show that a map between Hilbert spaces which preserves the inner product 
must be linear. Deduce that isometries on a real Hilbert space must be of the 
type f(x) = Ux +a where U*U = / anda eé H. 


(Hint: Let g(x) := f(x) — f(O), an isometry; show (g(x + y), g(z)) = 
(g(x) + g(y), g(z)), 80 g(x + y) — g(x) — gy) € Limg]N (im g)*.) 


Find best approximate solutions for 
147 4 
(i) 258 }x={-1], (ii) eat 
369 0 


To find the best-fitting plane z = ax + by +c toa number of points (X%7, Yn, Zn), 
where z,, is the dependent variable, least squares approximation gives 


200 10 Hilbert Spaces 


as 1 pa Xn aA Yn Cc ae <n 
pa Xn x ay ae Xn Yn a = par Xn&n 
pe Yn Din XnYn ba Ya b a Ynn 


13. * The method is not at all restricted to linear geometric objects. Find the best- 
fitting circle x? + y* +.ax + by = c to a number of points (xn, yn). 


14. The pseudo-inverse of the left-shift operator on ¢? is the right-shift operator, and 
vice versa. 


15. ForanyT € B(X, Y), TT‘T =T, because both x and T'T x belong tox+kerT. 
So T'T and TT" are projections; which precisely? 


16. The transformation T* : im T @ im T+ — kerT+ is linear but continuous only 
when im T is closed (Hint: if Tx, — y then Tx, = TT'T xX) > TT'y). 


17. Recall the Volterra 1-1 operator Vf (x) := fe, f on L?(0, 1]. If g is differ- 
entiable, then V'g = g’, and the Tikhonov regularization solves the equation 
f-af"=¢. 

18. An oscillating pendulum is captured on video at 25 frames/s. The angle @ (in 


rad) that the pendulum makes with the vertical, for 1 s worth of frames (1-26), 
is given in the table below. Theoretically, 6 satisfies 


6+ west I sind = 0, 
m r 
where g = 9.81ms~* and «/m, and r are unknown numbers. From the data, 
estimate 6, by (On41 — On—1)/26t, and bn by (6n41 — 20, + On—1)/6t, thereby 
getting equations of the type ax, + by, = Zn, where x, = 62, Yn = Sindy, 
— —6n, and a, b are unknown constants. Use regression to find a, b (hence r 
and «/m) that best fit these data. 


1 2 3 A 5 6 7 8 9 
0.372 | 0.210 | 0.043 | -0.126 | -0.291 | -0.447 | -0.589 | -0.714 | -0.816 
10 11 12 13 14 15 16 17 18 
-0.900 | -0.957 | -0.988 | -0.993 | -0.972 | -0.923 | -0.854 | -0.756 | -0.640 
19 20 21 22 23 24 25 26 
-0.505 | -0.353 | -0.192 | -0.025 | 0.144 | 0.308 | 0.462 | 0.600 


19. Phylogeny: Bioinformaticians can create a score of how far apart two species 
are genetically. An example is given in the adjoining table, together with the 
suspected evolutionary tree. Assign constants to each edge in the tree which best 
match the given scores, i.e., the sum of the edge constants along the path from, 
say, A to D should be as close to 6.16 as possible. 
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A B Cc D 


B [222 - 

C)612 5.60 = 

D |6.16 5.70 1.70 - 

B59. S06 Be? 32 a a ee 


10.6 Orthonormal Bases 


Definition 10.27 


An orthonormal basis of a Hilbert space H is a set of orthonormal vectors E 
whose span is dense, 


Vei,e; CE, (e;,e))= 6, [LE]=H. 


The second condition is equivalent to E +=9 (Example 10.14(4)), ie., 


VeekE, (e,x)=0 8 x=0. 


Examples 10.28 


1. 


The sequences e, := (dni) = (0,...,0, 1,0, ...) form an orthonormal basis for 
2. 


Proof Orthonormality is obvious, 


(€n, @m) 2 = ((O,...,0 ,1,0,...),(0,...,0 ,1,0,...)) = dam. 
—{ E> a 


n m 


If the sequence x = (ao, a1,...) is in [[e@o, e1,... I; then a, = (€n, X)p2 =O 
for any n; hence x = 0. 


. Gram-Schmidt orthogonalization: Any countable number of vectors { v, } can 


be replaced by a set of orthonormal vectors having the same span, using the 
Gram-Schmidt algorithm: 


UO ‘= VO, €9 = uo/||uoll 
—1 
Un 2= Vp — 2 (Ci, Un)@i, Cn = Un/|lUnll- 


It may very well happen that u, = 0, in which case it and v, are discarded 
and vy+1 relabeled as v,. Clearly, [[e1,..., én] = [v1,..-, Un]], not taking the 
discarded v, into account. Hence [[éo, e1, ... ] = [vo, v1, ... J. 
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3. Suppose x = Se Qmeém for an orthonormal basis { eo, e1, €2, ... }; then taking 
the inner product with e, gives the simple formula a, = (ey, x). The next section 
discusses whether every x can be so written. 


4. The set of basis vectors need not be countable; when uncountable, the Hilbert 
space is not separable, because the vectors e, are equally distant from each 
other ||eén — em|| = 2, so that the balls B.(en) are disjoint fore < /2/2 
(Exercise 4.21(4)). Conversely, if E := {e, } is a countable orthonormal basis, 
then [E]], and H = [EJ], are separable. 


5. * Every Hilbert space has an orthonormal basis. 


Proof Consider the collection of all orthonormal sets of vectors. It is nonempty, 
so Hausdorff’s maximality principle implies that there is a maximal chain of 
orthonormal sets E,. But E := LU, Eq is also an orthonormal set, for pick any 
two distinct vectors €, € Eq and eg € Eg © Eg, say, then eg 1 eg. So E is 
a maximal set of orthonormal vectors. E+ = 0 otherwise E can be extended 
further, so TZ] =H. 


Fourier Expansion 


The utility of orthonormal bases lies in the ease of calculation of the inner product: 


Proposition 10.29 Parseval’s identity 


If x = >), Qnén and y = >°,, Bren, where { e, } are orthonormal, then 


Co) = Oe ery) 


n 


In particular, ||x|| = (= lan!?)'”. 


n 


Proof A simple expansion of the two series in the inner product, making essential 
use of the linearity and continuity of ( , ) as well as orthonormality, gives the result: 


(x, y) =>) > aBnlen, em) = >) aip>n- gO 


n 
Parseval’s identity is the generalization of Pythagoras’ theorem to infinite dimen- 
sions. The question remains: when can a vector be written as a series of orthonormal 
vectors? The next proposition and theorem give an answer. 
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Proposition 10.30 


Let { 1, e2,...} be a countable orthonormal set of vectors in a Hilbert 


space H, then 


[o.@) 


>) nen convergesin H < (ay) € Ve 


nl 


Proof By Pythagoras’ theorem we have 
lQnen ++ +++ mem ||” = lan |? 5 ae lavals 


This shows that pee Qye€n is a Cauchy sequence in H if and only if ye, |Qn |? is 
Cauchy in C (Example 7.20(1)). Since H and ? are complete, >", @n€n converges 
if, and only if, (ay) is in C7. g 


The convergence of >°,, Qn, need not be absolute in infinite dimensions; for 
the latter to be true requires that > lQnén|| = > |Q,| converges, that is, (a@,) € 
é! c ¢. Nevertheless, a rearrangement o of an orthonormal basis does not affect 
the expansion, bs Anen = ~~, Qo(n)€o(n), because @g(n) remain orthonormal and 
(ony) € €. 


Theorem 10.31 Bessel’s inequality 


If { e;, e2, ...} are orthonormal in an inner product space, then 


 eaalk < TEP: 
nN 


When { e, } is an orthonormal basis of a Hilbert space, 


of » (Bas 2p. 


n 


Proof (i) Fix x and let xy := yy (€n, X)€n. Writing Qp, := (en, x), we have 


— (xy, xX) — (x, XN) + (XN, XN) 


N N 
2 = — 
= |[x||- - 2 AnQn + y Onn (ens Em) 


n=1 n,m=1 


204 


— 


0 Hilbert Spaces 


N 
2 2 
= |Ixl? -— >) lanl’, 


n=1 


hence 
N 


Si Mens x)/? < Ux. (10.3) 


n=1 


As a bounded increasing series, the left-hand side must converge as N — oo, and 
Bessel’s inequality holds. 

As a matter of fact, even if { e; } is an uncountable orthonormal set of vectors, 
the same analysis can be made for any finite subset of them. Inequality (10.3) then 
shows that there can be at most N — 1 vectors e; with |(e;,x)|? > ||x||?/N, for 
any positive integer NV, and so only a countable number of terms with (e;,x) 4 0. 
Therefore >"; |(e;, x) | is in fact a countable sum, bounded above by ||x 7. 


(ii) By the previous proposition, the series >°,, (én, X)€n converges in a Hilbert space, 
saytoye H.Butx—ye {e1,e2,...}¢ = 0, since for all N € N, 


[ee 


(en, x — y) = (ew, x) — >) (en, xen, en) = 0. 


n=1 


An orthonormal basis is thus a Schauder basis. oO 


Proposition 10.32 


Every N-dimensional Hilbert space is unitarily isomorphic to R” or C’. 
Every separable infinite-dimensional Hilbert space is unitarily isomorphic 
to ¢* (real or complex). 


Proof Suppose H is a separable Hilbert space, with some dense countable subset 
A ={a,a2,...}. The Gram-Schmidt process converts this to a list of orthonormal 
vectors E' = { e1, €2,...}, which is then a countable orthonormal basis of H since 
[Z] =A] > A=H. 
Consider the map 
J:H>@ 
Xx (Qn), Qn = (en, X) 


Bessel’s inequality shows that (q@,,) is indeed in ? (if H is areal Hilbert space, Qy, 
are also real). Linearity of J follows from that of the inner product. Preservation 
of the inner products and norms, (x, y) 7 = (Jx, Jy) ,2, is precisely the content of 
Parseval’s identity. 

J is onto: for any (a;,) € 2, the series oe Qn€n converges to some vector x by 
Proposition 10.30, and this is mapped by J to 
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Joseph Fourier (1768-1830) A Napoleonic supporter, almost 
guillotined in the aftermath of the French revolution, he suc- 
ceeded his teacher Lagrange in 1797. Besides being a gov- 
ernment official and an accomplished Egyptologist, his math- 
ematical work culminated in his 1822 book on Fourier series: 
“sines and cosines as the atoms of all functions”; it revolution- 
ized how differential equations were solved. But Lagrange had 
pointed out that the expansion might not be unique, or even 
exist. Which functions have a Fourier series? This question led 
to refined treatments of integration such as Riemann’s, and to 
Cantor’s set theory; but also to studies into what convergence 
of functions is all about, when it is not pointwise. 


Fig. 10.5 Fourier 


Jx = ((€n, a Qmem)) = (Qn). 


The Hilbert space is N-dimensional precisely when E has N vectors; in this case 
it is a classical basis of H. J remains a surjective isometry, with R or C¥ replacing 
? Oo 


Examples of Orthonormal Bases 


Orthonormal bases are widely used to approximate functions, and are indispensable 
for actual calculations. There are various orthonormal bases commonly used for the 
space of L? functions on different domains. Each basis has particular properties that 
are useful in specific contexts. One should treat these in the same way that one treats 
bases in finite-dimensional vector spaces — a suitable choice of basis may make a 
problem amenable. For example, for a problem that has spherical symmetry, it would 
probably make sense to use an orthonormal basis adapted to spherical symmetry. 

Consider the simplest domain, the real line. There are three different classes of 
non-empty closed intervals (up toa homeomorphism): [a, b], [a, co[, and R. Various 
orthonormal bases have been devised for each, with the most popular being listed 
here. 

L?{a, b|—Fourier series 


Proposition 10.33 
The functions e2"'"*, n € Z, form an orthonormal basis for L”[0, 1]. 


Proof Orthonormality of the functions is trivial to establish, 


1 
dermne. ern) = i e2tix(m—n) dx = Onm: 
0 
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Suppose f € {e?"* }ie., is e~?7nx f(x) dx = 0 forall n € Z. Recall that the 
Fourier coefficients give a 1-1 operator F : L'[0, 1] + co(Z) (Theorem 9.25) (note: 
L7[0, 1] C L'[0, 1]), so Ff = 0 implies f = 0 and hence { e?7""* :n € Z}+ =0. 

Oo 

Of course, there is nothing special about the interval [0, 1]. Any other interval 

[a, b] has a modified Fourier basis. For example, { Tae "x :n € Z} is an orthonor- 


mal basis for L2[—7, 7]. 


Examples 10.34 


1. » The Fourier expansion becomes, for f € L7[0, 1], 


ioe) 
f(x) = Pe Ope2™in 


n=—OoO 


where a, = (e?™"*, f) = i e~?inx (x) dx are the Fourier coefficients of f, 
and the convergence is in L”[0, 1] not necessarily pointwise. (However, a diffi- 


cult proof [39] shows that there is pointwise convergence a.e.; see also Exam- 
ple 11.29(5)) 


2. The classical Parseval identity is 


T foe) 
i IF)? dx = SO lanl? + lon!’ 
ak n=—00 


where a, —iby, = —inx f(x) dx are the L?[—7, 1]-Fourier coefficients. 


Tr Sone 

3. Fourier series have a wide range of applications, especially in signal processing. 
For example, the operator F*1;_n,njF is called a low(frequency)-pass filter: 
Given a signal f, 1,—7,Nn] discards the higher-frequency terms from the Fourier 
coefficients Ff; F* then builds a function from the remaining coefficients, 
resulting in a smoothed out low frequency band signal (for example, without a 
high frequency hiss). 


L?[—1, 1]—Legendre polynomials 


We’ve seen that the set of polynomials is dense in the space L*[a, b] (Proposi- 
tion 9.20) but the simplest basis, namely 1, x, x7... ., 1S not orthogonal, as can be 
easily verified by calculating, say, (1,x”) = (b* — a*)/3. This can be rectified by 
applying the Gram-Schmidt algorithm. On the interval [—1, 1], the resulting poly- 
nomials are called the (normalized) Legendre polynomials (Fig. 10.6). The first few 


are 
1 eo ae x 
; x, x joc 
Jz V2’ 2V2 3 
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Legendre polynomials Laguerre functions Hermite functions 


Fig. 10.6 Orthonormal bases (The first ten functions of each basis are plotted as rows in each 
image; brightness is proportional to the value of the function, mid-grey being 0) 


with the general formula being 


PalX) = 


yta ( d . eo 


2"n! \dx 
These polynomials satisfy the differential equation 

Lpn = —n(n+1)pn, where L = D(1 — x*)D = (1 — x*)D? — 2xD. 
L?[0, co[—Laguerre functions 


This Hilbert space does not contain any polynomials x”, but their modified versions 
x"e—*/? do belong. A Gram-Schmidt orthonormalization of them gives the Laguerre 
functions, the first few terms of which are 


er, d-xe*?, (L-2x4+ x ee 
and the general formula is 
LG) = Tet Dmx" ey, 
The Laguerre functions satisfy (prove!) 
Sl, = —-—(n+ ae where S$ := DxD — x/4. 


The Laguerre polynomials (the polynomial part of /,) can also be thought of as an 
orthonormal basis for if (Rt) with the weight e~*. 


L?(R)—Hermite functions 
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—x?/2 


Here, orthonormalization is performed on the functions x”e (equivalently, take 


x” in iL (R) with the weight e~* ) to get the Hermite functions, 


L_-x?/2 2 =x?/2 __1 2 —x?/2 
ae * 2, Fare *?, ag (2x Ser Pec. 
= CI" _ 6x? /2 pyrex? 
hy(x) = Jaaiqia © D'e™~* . 
: 2 ws oo 2 
To prove orthogonality, first show that D(e* D"e~*) = —2ne* D"~'e-*’, and 


deduce that (hy, hm) = 2n(hn—1, hm—1). The Hermite functions satisfy 


Rhy, = —(2n+1)hn, where R := D* — x?. 


Other Domains 


Some other useful orthogonal bases on L(A) spaces are, in brief: 

Circle L7(S!): Since the circle S! is essentially the interval [0, 27] as far as 
L?-functions are concerned, the periodic Fourier functions e'”? form an orthogonal 
basis for it. 

The Chebyshev polynomials, 7;, (cos 8) := cos n6, are the projection of the cos n@ 
part of this Fourier basis, from the unit semi-circle to the x-axis [—1, 1]. They are thus 
orthogonal on L?,[—1, 1] with the weight 1/1 — x? (since d@ = —dx/V1 — x?). 

There are many other orthonormal bases adapted to Le [a, b]. Rodrigues’ formula 
describes orthogonal functions on Tie, bl, 


falx) = w(x)! D" (w(x) p(x)") 


for a quadratic polynomial p with roots at the endpoints a, b, and weight function 
w: the Legendre, Laguerre, Hermite, and Chebyshev functions are all of this type. 


Plane? (R*): An orthonormal basis for the plane can be obtained by multiplying 
Hermite functions hn (x) hm (y). In general, if e, (+) and é,(y) are orthonormal bases 
for L2(A) and L?(A), then e,(x)ém (y) form an orthonormal basis of L*(A x A). 


Disk L7(B,(0)) Bessel functions: The functions on the unit disk taking the value 
zero at the boundary have an orthogonal basis Timer. where Am, are the 


zeros of the Bessel function J, (x) := >-°° (<D" (x /2)2"+" (Fig. 10.7). 


m=0 m!(n+m)! 


Sphere L7(S?) Spherical Harmonics: 


(21 + 1)(1 — m)! 


4n(1 +m)! Py, (Cos ayem®, 


¥! @, 6) := 


where pl (x) = (—1)"(1—x?)"/? D” P; (x) are the “associated Legendre functions”. 
They depend on two indices, / € N andm = —/,..., +. 
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Fig. 10.7  Bessel’s functions, 
Jn(Amnr) cos(n8), n,m = 
0,1,2 


Exercises 10.35 
1. Orthonormal vectors must be linearly independent. 


2. In finite dimensions, orthonormal bases span the vector space, [[e;,..., ev] = 
H (Theorem 8.22). 
In infinite dimensions, an orthonormal basis is not a basis in the linear algebra 
sense (Hamel basis), which requires the stronger spanning condition [ EF] = H. 


3. Comparing coefficients: if }°,, Qnén = >°, Bren. then an = Bp. 


4. If {e, } and {@ } are orthonormal bases for Hilbert spaces X and Y respec- 
tively, then { (e,,0)}U { (O, é) } form an orthonormal basis for X x Y (Exer- 
cise 10.10(6)). 


5. Let E := {e1, e2,...} be a set of orthonormal vectors, with [E] = M Cc H. 
For any x € H, the sum 5°, (en, x)en gives the closest point x, in M to x. 


6. » An operator U € B(H1, H2) is a unitary isomorphism if, and only if, it maps 
orthonormal bases to orthonormal bases. 


7. * It is quite possible for x = ~ (€n, X)@y, to hold true for all x in a Hilbert 
space, without e,, being orthonormal. Find three such vectors e1, é2, e3, in R?. 


But if Parseval’s identity ||x \|? = Dn Mens x)|? holds for all x € H, and |len || = 
1 for all n, then the vectors e, form an orthonormal basis. 


8. Expand the function x on [0, 1] as a Fourier series. 


(a) Assuming pointwise convergence, deduce Gregory’s formula 
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10. 


11. 


12. 


13, 


14. 


15; 
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(b) Use Parseval’s identity to deduce Euler’s formula 


eee _ 7 
3 6° 


. When f € L?[0, 1] is an even function about 5 meaning fG +x) = fG —x), 


then a_y, = a, and 


lee) [o@) 
>, One = ag + > 2an cos(27Nx). 


n=—0Oo n=1 


What if f is odd, or neither odd nor even? 


Show that cosn7x, n = 0,1,..., is an orthogonal basis for the real space 
L7(0, 1]. 


Show that Uf (x) := pal 


Hence find an orthonormal basis for L7[a, b]. 


“) is a unitary operator L?[0, 1] > L?[a, bd]. 


> The Fourier operator F : L7[0, 1] — ¢? is a unitary isomorphism between 
[o,@) 
Hilbert spaces. Its adjoint is F*(ay,) = >», ger. 
n=—OOo 


Prove that the Legendre polynomials are orthonormal in L*[—1, 1], as follows: 
Define uy (x) := (x? — 1)", and gn := D" uy; show by induction that 


(a) Dkuy(+1) = 0, fork <n, 
(b) (D" un, D" um) = —(D* yu, Dr ay, 


(Cc) (dn,4m) = 0 unless n = m. 


* The Legendre polynomials Py, := py/,/n + 5 have the property, 


CO 
1/|lu — yll = >" Pn(cos 6) 
n=0 
where u is a unit vector, r := ||y|| < 1, and @ is the ace between u and y. 


(Hint: Show f;(x) := 1//1+ r? — 2rx satisfies Tf, = r Brfr), then write 
Sr (x) = ae On (rT) Pn(X).) 


> A frame is asequence of vectors e, € H (not necessarily linearly independent) 
for which the mapping J : x +> ((én, X))nen is an embedding H > M C 2?. 
By Proposition 7.12, this is equivalent to there being positive constants a, b > 0, 
allxlla < l|Jxlle < Ollx|la, te. 
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1 
Be>0, <M? < Di Men x)? < ella’. 
n 


Let 6, (ay) 1= ay and L := (J*)7~!; then x > 6;L-x is a continuous functional, 
hence there is a unique vector é; such that 6,Lx = (€,, x). 


(a) The two sets of vectors e, and é, are bi-orthogonal, that is, (é,€n) = Omn- 
(b) J‘L=1I=L*J,s80 


x= > (én, X)€n = > (€n, X)en- 


n n 


Applications 
Frequency-Time Orthonormal Bases 


An improvement on the classical orthonormal bases for functions t +> f(t) in 
L?(R) are bases that give information in both ‘frequency’ and ‘time’. In contrast, the 
Fourier coefficients, for example, only give information about the frequency content 
of the function. A large nth Fourier coefficient means that there is a substantial 
amount of the term e?”"”", somewhere in the function f(t) without indicating at all 
where. The aim of frequency-time bases is to have coefficients aj, that depend on 
two parameters n and m, one of which is a frequency index, the other a “time” index. 
The a,» coefficients, much like musical notes placed on a score, indicate how much 
of the frequency corresponding to n, is “played” at the time corresponding to m; 
they are able to track the change of frequency content of f with time. Of course, the 
reference to ¢ as time is not of relevance here; f can represent any other varying real 
quantity. 

Windowed Fourier Bases (Short Time Fourier Transform): A basic way to achieve 
this is to define the basis functions by 


Amn(t) = erin h(t —m), 


where / is a carefully chosen (real) window function, with ||A||;2 = 1, such that 
hyn are orthonormal. The simplest choice of window function is h = 1 a4 P other 


popular possibilities, such as the Hann window cos? (Tt) (-5 <te< 5) and the 


Gaussian coe! */20* do not give orthonormal bases but are useful nonetheless. 

One can then obtain a picture of f spread out in time and frequency, called a 
spectrogram (Fig. 10.8), by plotting the coefficients | (Amn, £)|* (often letting m and 
n vary continuously in R and R* respectively to get a smooth picture). 

Note that the coefficients (hy ,, f) are really just (dy.,) = F(h(t — m) f(t). 
So summing the coefficients in n, keeping the position m fixed, gives the windowed 
function: 
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frequency f 


time ¢ 


Fig. 10.8 Spectrogram of a piano piece, showing clearly the duration, frequency, and harmonics 
of each note 


Danner" = ht — m) f(t) 


n 


and similarly, when >", h(t — m) = 1, 


Fin) = f e2e™ S ne = mF dt = Dana 


m 


The greatest disadvantage of these bases is that the window ‘width’ is predetermined; 
it ought to be large enough to contain the low frequency oscillations, but then the 
time localization of the high frequencies is lost. The aim of the windowed Fourier 
basis is only achieved over a limited range of frequencies. To circumvent this, one 
can make the window width decrease with the frequency parameter n — this is the 
idea of wavelets. 

Wavelet Bases: The basis in this case consists of the following functions in Ei; i] 


Wmn(t) = Tn Sonyp(t) = 2"/24p(2"t — m), (m,n € Z) 


where ~ and ¢ are carefully chosen ‘mother’ and ‘father’ functions in L?(IR). The 
function ~ serves both as a window (ideally with compact support) and an oscil- 
lation. The basis functions ~,,, are then scaled and translated versions of w. They 
have the advantage that the resolution in ‘time’ is better for higher frequencies than 
the windowed Fourier bases, and so require less coefficients to represent a func- 
tion to the same level of detail. One example is the classical Haar basis, generated 
by W(t) := 10,1] — 11,2] (prove orthogonality of wn,,). Other wavelets, gener- 
ated by continuous functions, are more popular, e.g. Mexican-hat ((1 — Pye"/ a 
Gabor/Morlet (e77'/7 er! usually f = 1; Fig. 10.9). The analogue of the spec- 
trogram is the scalogram, which is a plot of the coefficients Wf (a, b) := (wWa.p, f) 
where ta,b(t) = FeV"). 

In a multi-resolution wavelet scheme, a subspace V; of the Hilbert space L?(R) 
is split recursively into low and high resolution parts as V,+1 = V;, ® Wn, where 
Wa = Var 0 Vint 
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Sh nr 


Fig. 10.9 Three wavelets: Haar, Mexican hat (with a translated and scaled version), and Morlet 
(real and imaginary parts) 


Ve = Ve-1 BWe-1 =--- =VODWO EW, O--- @ We-1 


If we suppose V, and W,, to be spanned by orthonormal bases {@m.,.7 : m = 
0,...,N —1} and {mn : m = 1,...,N — 1}, that are generated by scaling 
and translation from a “father” and “mother” wavelets ¢ and w respectively, then, 
by recursion, one need only ensure Vj = Vo 8 Wo = [4] @ [[v for this scheme to 
work. Therefore the requirements are that ¢, w € V; be orthonormal. For N even, 
the following “refinement equations” are sufficient, 


P(x) = anp(2x) + a o(2x — 1) +--+ + ay-162x —-N +1), 


W(x) = ay-10(2x) — any_2$(2x — 1) +--+ — and2x —N +1) 


agt+-+++ay_) =2. 


Recall here that 6(2x — m) = 27!/?m,4(x) has norm 1/+/2, so |||? = >, a2, /2. 
For example, the Haar basis satisfies #(t) = ¢(2t) + d(2t — 1), Vt) = (26) - 
p(2t — 1). The Daubechies wavelet basis of order N is a multi-resolution scheme 
with an optimal choice of coefficients a;, in which the wavelet w is taken to be of 
compact support and ‘smooth’ (more precisely, with N zero moments; see [27]). 


Solving Linear Equations 


Orthonormal expansions can be used to solve linear equations 7x = y, where x and 
y are elements of some (separable) Hilbert space, and T an operator on it. Given 
an orthonormal basis { e, }, the vectors x and y can be written in terms of it as 
x = >), dnen and b = >, bnen. Of these, the scalar coefficients a, := (en, x) 
are unknown and to be determined, but b, := (en, y) can be calculated explicitly. 
Substituting into Tx = y we get 


DanT en =T (x stn) = nen: 
nN n n 


Moreover the vectors Te, can also be expanded as Ten = >~ m Lm,n€m for some num- 
bers Tin,zn = (€m, Ten). So, comparing coefficients in the equation >- 
in Omem, we find 


n,m @n Tn.n€m = 
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> Tin.nQn = by. 
n 


This can be thought of as a matrix equation in 7 with the matrix [Tn.n] having a 
countable number of rows and columns: 


Ti, Ti2... a bi 
Tr, Tx... a2 | — {| bo 


It is precisely the equation Tx = y written in terms of the coefficients of T, x and y 
in the orthonormal basis e,,. Effectively, the problem has been transferred from one 
in H to one in 2, via the isomorphism J : H > 2. 

For practical purposes, one can truncate the matrix and vectors to yield a finite 
N x N matrix equation that can then be solved. This can be justified because the 
remainder terms of y and x, namely pace N41 bneén, etc., converge to0 as N > ow. 

For theoretical purposes, the method is useful if the orthonormal basis elements 
€n are eigenvectors of T, that is, Te, = Anén. This makes the matrix of T diagonal. 


The equation is easily solved, ay, = by /An, unless A, = 0. If An = 0 (Le., Tx = 0 
has non-trivial solutions) there are no solutions of Oa, = by, unless b, = 0, in which 
case the da, are arbitrary. Thus there will be a solution x if, and only if, b, vanishes 
whenever A, does, or equivalently, y | ker7’. Separating the vectors ey, that satisfy 
Te» = 0 from the rest, the complete solution is 


x= > Amem + > seen: 


m:y,=0 nv, 40 I 


where @,, are arbitrary constants. The first series is a solution of the “homogeneous 
equation” Tx = 0, while the second series is a “particular solution” of Tx = y. 

For the case of the Hilbert space L?(A), with e, and b = f all functions, the 
particular solution can be rewritten as 


bn wo (ens fy en(S)en(x) 
an - 5, «= [(E a ) fooas 


The kernel G(x, 5) := Dae €n(8)€n(X)/Xp is called the Green’s function of the oper- 
ator T. 


n 
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Gaussian quadrature 


A central problem in numerical analysis is to find an approximation for the integral 
of a real function, in the form 


b 
/ f © af(ei)t---+an flan) = of), 


where aj, x; are fixed numbers; note that ¢ is a functional acting on f. The familiar 
trapezoid rule and Simpson’s rule are of this type, where the x, are equally spaced 
along [a, b]. The question arises as to whether we can do better by choosing x, in 
some other optimal way. 

Let e,(x) be real orthonormal polynomials of degree n in the space L7{a, bl, 
obtained from 1, x, x”, ..., by the Gram-Schmidt process. By orthogonality, their 
integrals vanish since i : én = (1, en) = 0, except for J i €0 = |I1llz27a,5)- Certainly, 


for d(e,) to agree with the integral sig e, forn = 1,..., N — 1, we must require 
eo(x1) --. eo(xn) r Il 11 


e(x1) «.. e1(xn) ; 0 


en—1(X1) -.. en-1(0N) ss 0 
which can be solved for a, when x, are known. The main point of Gaussian quadrature 
is that if x, are chosen to be the N roots of the polynomial ey (x) (assuming they lie 
in [a, b]), we also get [? en = 0 = (en) forn <2N —1. 

For consider the division of any e := e, (1 <m<2N—1)byey,e=qen+r 
where qg andr are real polynomials of degree at most N — 1. Then, as eg is proportional 
tol,andg €[[l,x,...,x%~'] = [leo,..., ew, 


b 
o=(e= f jet eee aie 


Hence r = SS. byex for some scalars bg, and by the choice of the coefficients ay, 
and ey (x,) = 0, 


e(Xn) = q(xnyen Qn) +r (Xn) =r %n), 
N 


N N No b 
so P(e) = > ane(rn) = Yarn = b> bx Danek Xn) =0= 7. é. 
k=1 ¢ 


n=1 n=1 n=1 


Thus the integral of any f = 5°, nen € L?[a, b] agrees with $(f) up to order 
n=2N-1, 
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2N-1 


b b 
| f= Yo | en ~ > an (en) © o(f). 
a n 7 n=l 


The residual error can be made as small as needed by taking a larger N. 
For example, using the Legendre polynomials, (prove!) 


1 
/ f(x) dx & 0.35 f(—0.86) + 0.65 f (—0.34) + 0.65 f (0.34) + 0.35 f (0.86). 
-1 


All this applies equally well for weighted i (A) spaces; for example, using Laguerre 
polynomials, 


| ~ fixe dx © 0.60 (0.32) + 0.36 f (1.75) + 0.039 f (4.5) + 0.00054 f (9.4). 
0 


In practice, the algorithm of choice of most mathematics software is currently the 
Gauss-Kronrod algorithm, which performs Gaussian quadrature but refines it adap- 
tively by taking more evaluation points if necessary. 


Signal Processing 


Sounds, images, and signals in general can be thought of as vectors in L?(R), L?(R7), 
and L*(A) respectively. They can thus be decomposed into orthonormal sums with 
all the advantages that that entails. Three applications are: 


(a) Storing only the “largest” coefficients a, := (e,, x) of an orthonormal expansion 
leads to a useful compressed form of the vector x. Compression ratios of about 
100 are quite typical. A close copy of x can easily be regenerated from these 
coefficients using x = >*,, @nén. Although not identical to the original (because 
the small terms were omitted), it may be good enough for the purpose, especially 
since the smallest coefficients are usually unappreciated fine detail or noise. 


(b 


wm 


A vector can be altered intentionally by manipulating its coefficients. For exam- 
ple, it can be improved by filtering out noise coefficients, or particular features 
in a function may be picked out, e.g. image contrast may be enhanced if certain 
coefficients are weighted more than others. 


(c 


wm 


A vector may be matched with a database of other vectors, by taking the inner 
product with each of them, using Parseval’s identity (x, y) = >°,, @Gn. That 
vector with the largest correlation (x, y) gives the best match and can be selected 
for further investigation. 


Consequently, the storage, transmission, rapid retrieval, and comparison of images 
and sounds have seen a tremendous change in the past two decades, in part feeding 
the growth not only of the internet and mobile phones, but also of new scientific tools. 
For example, speech-, handwriting-, and face-recognition software find phonemes, 
characters, and faces that best match the given input; an E.C.G//E.K.G. or E.E.G. 
signal may be compared to a database for the early detection of cardiac arrest or 
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epileptic fits; the U.S. FB.I. performs more than 50,000 fingerprint matches daily, 
etc. 

To see one application in some detail, let us look at one popular image format— 
JPEG (1992 standard). Color images consist of an array of pixels, each digitized into 
three numbers (R, G, B) € [0, 1] representing the red, green, and blue content. 
In the JPEG algorithm, the three RGB color bytes for each pixel are usually first 
converted to brightness, excess red, and excess blue, 


Y:=rR+gG+DbB, 


1 1 
Cp == + (R-Y), 


2 2(g+b) 
1 
C= B-Y), 
b OG ay. ) 


wherer © 0.25, g © 0.65, b © 0.1 are agreed-upon constants such thatr-+-g+b = 1. 
This is done to avoid effects due to color-shifts and because the brightness picture 
carries most of the visible information; in fact the excess red/blue pixels are reduced 
in number by a factor of 4 because the eye is not sensitive to fine detail in pure color. 

The image is then split into 8 x 8 blocks, and each block is expanded with respect 
to the cosine basis cos(mn(x + 5)/8) cos(mm(y + 5)/8) (the cosine transform is 
preferred for positive functions in general because the first few coefficients are larger; 
however it is not so good for sharp lines). The resulting 64 coefficients for each block 
are discretized (by multiplying by a user-defined weight, and taking the integer part). 
Most are now zero, and the rest are squeezed further using the standard Huffman 
compression algorithm. This way, a4Mpixel image, that normally requires 12 million 
bytes in raw formats, can easily be reduced a hundredfold in file-size without any 
visible loss of quality. JPEG 2000 uses wavelets instead but works in essentially the 
same way; MPEG is JPEG 1992 adapted to video. 

Similarly a5 min CD-quality stereo sound clip, sampled at 44,000 times 16 bits a 
second, would normally need at least 52 Mbytes. It can be compressed to about 10 % 
of that by MP3, an algorithm that works in an analogous way as JPEG, but adapted 
to sound signals. 


Remarks 10.36 


1. The norm on matrices in B(C’ , C™) that comes from the inner product defined 
in Example 10.2(2) is not the same as that defined in Theorem 8.7 (but recall 
that all norms on finite-dimensional Banach spaces are equivalent). 


2. Re (x, y) is a real-valued inner product (over the reals), but Im(x, y) fails the 
last two axioms. 


3. A real inner product on the real vector space X can be uniquely extended to its 
complexification X + iX, by 


(x1 + ix2, yi + ty2) <= (x1, yi) + (X2, y2)) FECX1, y2) — (x2, y1))- 
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Thus an inner product on R can extend in several ways to R2, but in only one 
way to C%, 


There is an interesting analogy between linear subspaces and logic: Think of sub- 
spaces as “statements”, with A => B meaning A C B, and FALSE, TRUE, A AND 
B, AOR B, NOTA, corresponding to 0, X, AN B, A+ B, and At, respectively. 
What are the logical rules that correspond to Proposition 10.9? Are all classical 
logic rules true in this sense? 


The polarization identity states that a complex inner product (x, y) is a weighted 
average of lengths on a circle of radius ||x ||, centered at y. It can be generalized 
further: if w = 1 (N > 2), then 


N 
ik 
WaT dw" ly + w"x|/7. 
n=1 


Even more generally, (x, y) rf silly + zx||? dz. 


=a 


A normed space with a conjugate-linear “isomorphism” J : X — X™*, has a 
sesquilinear product (x, y) := (x*y+y*x)/2 (where x* := Jx). The additional 
property x*x = ||x||? turns it into an inner product space, compatible with the 
norm of X. 


The conjugate gradient method is an iteration to solve T*Tx = y, used espe- 
cially when T is a very large matrix. Note that (x, y)) := (x, T*Ty) is an inner 
product when T is 1-1. If e; were an orthonormal basis with respect to this inner 
product, and x = Dae aje;, then 


aj = (ej, X)) = (ej, T*Tx) = (ej, y), 


and x can be found. The iteration is essentially the Gram-Schmidt process applied 
to the residual vectors r, = y — T*T xn, while calculating the approximate 
solutions x, on the go, (|x|? := (x, x))) 


eo = y/Ilyil. eng = Tn — Mens Tn)dens 
ntl = C144 / lle, 1M, 
xo = (e0, y)eo, Xn 41 = Xn + (en41, V)en41, 


mi=y—T*Tx, Tee = yo T* Tx. 


QR decomposition: any operator T : X — Y between Hilbert spaces maps 
an orthonormal basis e; € X to a sequence of vectors Te; € Y. If these are 
orthonormalized to e using the Gram-Schmidt process, then Te; = a= 1 Qj é’. : 
This means that, with respect to the bases e; and es T has the upper-triangular 
matrix R. If Q represents the change of bases in Y from é, to the original one, 
then the matrix of T is OR. 
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9. 


10. 


11. 


12. 


Acontinuous function f: [0,27] > C, f(0) = f (27), traces out a looped path 
or ‘orbit’ in the complex plane. If the Fourier coefficients are written in polar 
form, it is clear that each term a,e!”? = r,e!"9+n) describes a circle; and the 
sum of two terms describes the motion along a circle whose center also moves 
in a circle. The whole Fourier sum then represents a motion along regressively 
smaller circles. Ptolemy and other Greek astronomers were the first to describe 
a periodic motion in terms of these cycles within cycles. 


A non-separable Hilbert space is still isomorphic to an ¢7(A) space, one with 
an uncountable number of orthonormal basis vectors. For example every Hilbert 
space with an orthonormal basis { e; } where ¢ € [0, 1] is isomorphic to the space 
€7[0, 1] consisting of functions a; for which lal? = ~, la; |? < 00 (Note: a 
can take only a countable number of non-zero values.) 


The first important application of the least-squares method was by Gauss. In 
1801, G. Piazzi found the long-sought ‘missing’ planet between the orbits of 
Jupiter and Mars, but could not observe it again after it went behind the Sun. 
Gauss managed to recover its orbital parameters from Piazzi’s observations, 
and Ceres was relocated almost a year after its discovery. Essentially the same 
techniques were used in 1846 to predict the location of a new planet, Neptune, 
from the irregularities in the observed positions of Uranus. 


There is a discrete version of the Fourier basis, on L[0, 1], called the Walsh 
basis, which consists of step functions. For each N = 1,2,..., there are 2N 
Walsh basis functions, each with a step-width of 1/2 and the list of heights are 
the normalized column vectors of the Hadamard matrices. 


Chapter 11 
Banach Spaces 


In this chapter, we explore deeper into the properties of operators and functionals 
on general Banach spaces. At the same time, we generalize several definitions and 
propositions that hold for Hilbert spaces. As these spaces are, in many ways, very 
special and non-typical examples of Banach spaces, we need to modify these results 
in several technical ways: There are no orthonormal bases, or Riesz correspondence, 
or orthogonal projections available in Banach spaces. 


11.1 The Open Mapping Theorem 


The following theorem holds the key to several unanswered questions that were 
raised earlier. 


Theorem 11.1 The Open Mapping Theorem 


Every onto continuous linear map between Banach spaces maps open sets 
to open sets. 


Proof Let T : X — Y bean onto operator between the Banach spaces X and Y. Let 
U be an open subset of X, and let x € U, so that x € Be (x) C U. If it can be shown 
that T By contains a ball Bs(0), then 


Tx € Bse(Tx) = Tx +€B3(0) C Tx +e TBy =TB.(x) CTU 
implies that TU is an open set in Y, proving the theorem. 


Now X = UP, Bn(0), so TX = Ure, TB,(O). But TX = Y is complete, so 
by Baire’s category theorem, not all the sets TB, (0) are nowhere dense: there must 
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be an N such that T By (0) contains a ball. By re-scaling we find that T By contains 
a ball B, (a). It follows that for every y € B,(0) we have 


a+y= (jim. Txn, forsome x, € By, 


/ 


/ 
n> forsome x, € Bx, 


a—y= lim Tx 
noo 


| ae ap 
y= lm T 5 e€ TBy 


noo 


since ||x, — x;,|| < 2. Consequently we have that B,(0) € T Bx. 

Claim: TBy C T B3(0). Let y € T By, so that there must be an x9 € By such that 
lly — Txo|| < r/2; that is, ||xo|| < 1 and y — Txo € B,/2(0) © T By /2(0). But this 
implies that there is an x} € By/2(0) such that || y — T xq — Tx,|| < r/4. Continuing 
in this fashion, we get a sequence x, such that 


r 
lym Pa eee) || oe 


lXnll < an" 


1 
Qn’ 
We can conclude that x := >", x, converges absolutely, with ||x|| < lea ty = 2, 
and that y = Tx € TB[0] C T Bo+,(0). 

Re-scaling the vectors in B,(0) € T B3(0) gives B,/3(0) © T By and closes the 


argument. oO 


Corollary 11.2 
Every bijective operator between Banach spaces is an isomorphism. 


With this fact, we are ready for the analogue of the first isomorphism theorem of 
vector spaces, which is a generalization of the corollary. 


Proposition 11.3 


For any operator T : X — Y between Banach spaces, 


X/kerT =imT <$ _ imT isclosedin Y 
& dc > 0, Vx € X, |x + kerT|| < cl|Tx|]. 


Proof The mapping J : x + ker T +> Tx is well-defined because T(x + a) = Tx 
for any a € ker T. It is obviously onto im T, is 1-1 because 


Tx=0 8 xe€kerT © x+kerT =kerT, 
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is trivially linear, and continuous since, for a, € ker T chosen to satisfy ||x + ay || > 
|x + ker Tl, 


|Px|] = IT@ + an)|l < [TM le + an ll > [Ile + ker TI. 


So J is an isomorphism precisely when J~! is continuous, i.e., when the stated the 
inequality holds (Proposition 8.12). 

By the corollary to the open mapping theorem, this is the case if the range of 
J, namely im 7, is complete (closed in Y by Proposition 4.7). For the converse, 
X/ker T is complete (Proposition 8.18), as must be any isomorphic copy such as 
imT. Oo 


Examples 11.4 


1. If T € BCX, Y) is an operator on Banach spaces, and Y = im T © M for some 
closed linear subspace M of Y, then im T is closed in Y. 


Proof The mapping X/ ker T — im T defined in the proof above can be extended 
to (X/kerT) x M > Y by (x+ kerT,a) bh Tx + a; it is continuous and 
bijective, hence an isomorphism. The conclusion follows since it sends the closed 
set (X/kerT) x {0} toimT. 


2. » Let T : X — Y bea linear map between Banach spaces; its graph M := 
{ (x, Tx) : x € X} is a linear subspace of X x Y, and the map J : M > X, 
defined by J(x, Tx) := x is 1-1, onto, linear, and continuous. 


Closed Graph Theorem: If M is also closed in X x Y, then it is a Banach subspace, 
and the open mapping theorem implies that J is an isomorphism, so that 


IT xlly <@, Tx) < ellxlly 


and T must be continuous. 


3. p> It is important that Y be complete for the open mapping theorem to be valid. 
The identity map @! — © is continuous and 1-1, but ¢! is not isomorphic 
to its image, because the latter is not complete (in the oo-norm). For example, 
xy :=(, 5 ee x 0, ...) converge in the oo-norm, but not to an ¢!-sequence. 

4. If X has two complete norms, and ||x||_ < c|llx|| for some fixed c > 0, then 
the two norms are equivalent: the identity map Xj, — X\\ is continuous by 
hypothesis, and obviously linear and bijective; so its inverse is also continuous. 
Put differently, if two complete norms on X are inequivalent, then one can find 
vectors x, which are unit with respect to one norm, but growing indefinitely with 
respect to the other. Clearly, this can only happen in infinite dimensions. 
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Complementarity 
We are now in a position to answer an earlier question about projections: It is not 


always possible to project continuously to a closed subspace. The following propo- 
sition determines exactly when such a projection exists: 


Proposition 11.5 


There is a projection onto a closed linear subspace M/ of a Banach space 
X if, and only if, 


X=MON 
for some closed linear subspace JV. In this case M = im P, N = ker P, and 


MON=MxXN. 


We say that M, N are complementary closed subspaces. 


Proof The forward implication has already been proved (Example 8.16(3)). 
Conversely, suppose X = M@N, so that any x = a+bforsomea € M,beEN. 
Uniqueness of a, b follows from 


Qatbh=x=ath Sa-mMm=h-b Ee MNN=0), 
=> a, = a2 ANDD, =). 


This allows us to define the function P : X — X by P(x) := a. It is linear since 
P(Ax, +x2) = P(Aa, + Ab, + a2 + bz) = hay + a2 = APX + Px, 
When x belongs to M or N, we get the special cases 
Vae M, Pa= P(a+0)=a; VYbeN, Ph= P(0+5b)=0, 
so im P = M and ker P = N, since any x € ker P satisfies 0 = Px = a implying 
x=beEN. 

P_ is a continuous projection: P* = P since, for any x =a+be MON, 
P?x = Pa =a = Px. Finally, the map J: Mx N > X, J(a,b) := a+b, 
between Banach spaces, is 1-1, onto and continuous 

lla + dlly < llally + [lolly = 1@ Dllaen 


and so is an isomorphism by the open mapping theorem. Therefore 


| Px] = llall < llall + (loll = 1@, Dluxw < ella + blilx = ellxll 


11.1 The Open Mapping Theorem 225 


Every subspace M can be extended by another subspace N such that X = M@N 


(by extending a basis for M to span X) but complementarity requires M, N to be 
closed. 


Examples 11.6 


1. 


Finite-dimensional subspaces are always complemented. 

Proof The projection to M = [e,,...,ey]] is simply x b 6;(~)ey +--+ + 
dy (x)en, where 5,, are the dual basis for M* (6 (€n) := bpm). Although 6,, are 
defined on M, they can be extended to X* as seen later (Theorem 11.17). 


. Finite-codimensional closed subspaces are complemented. 


Proof Lete; + M,...,é€, + M bea basis for X/M, and let N := [e1,..., enll 
(complete). Then, for any x, 


n n 
x+M=) aj(ei+M) => ue +M=at+M 


i=l i=1 


which shows x -a € M,ae N,soxe M+N.If x € MON, then the above 
identity gives M = >"; aj(e; + M), so a; = 0 (linear independence of e; + M) 
and x = 0. 


. For Banach spaces, if T : X — Y is onto, and X = ker T @ M then it follows 


that Y = X/ker T = M is embedded in X. 


. * A Banach theorem: If X is a separable Banach space, then there is an onto 


operator T : ¢! > X. 


(‘Proof’ Let x, be dense in By, and let T : t! -» X be defined by T (en) = Xn, 
extended linearly. Then it follows easily that ||T' || = 1, so T By: = Bx, and bya 
similar argument of the proof of the open mapping theorem, T By: = By.) 
Hence, if X is not embedded in ¢!, then ker T is not complemented (by the 
previous example). 


Exercises 11.7 


1. 
2. 


. Third isomorphism theorem: Let M C N be closed subspaces of X, then 


For a projection P : X > X, X/im P = ker P, while ||x + ker P|| < || Px]l. 


Second isomorphism theorem: If M, N, and M + N are closed subspaces of a 

Banach space, then(M+N)/N = M/MQN, using the map M > (M+N)/N, 

Xrex+N. 

X/M ~ 
N/M — 

x using the map X¥/M > X/N,x+Mt>x+N.If M is finite-codimensional 

then codim N < codim M. 


. Let T : X — Y and S: X — Z be operators on Banach spaces. 


(a) If M isaclosed linear subspace of ker 7, then x + M +> Tx is well defined, 
linear, and continuous. 


226 11 Banach Spaces 


Nn 


(b) If S is onto and Sx = 0 => Tx = 0, then Sx } Tx is a well-defined 
operator in B(Z, Y). 


. * Suppose the Banach space X has a Schauder basis e, (of unit norm). For 


xX = >), Qn€n, it can be shown that |x|] := sup,, || 5°") ae;|| exists and is a 
complete norm. Show ||x|| < |||] and deduce that the map ¢@, : x > dy is in 
X*. These functionals form a Schauder basis for X*, called the bi-orthogonal or 
dual basis, and satisfy $;(@m) = dnm- 


. Let M,N be closed subspaces of a Banach space, with MM N = {0}. Then 


M+Nisclosed © P:M+N— M,x+yt x, is continuous. 


. If: X — F is linear with ker ¢ closed, then ¢ is continuous. 
. If M is acomplemented closed subspace of X, then X = + x M. 


. IfX = M@N with M, N closed, then there is a minimum separation ||u — v|| > c 


between any unit vectors u € M,ve N. 


11.2 Compact Operators 


A linear map is continuous when it maps bounded sets to bounded sets. There is a 
special subclass of linear maps that go further: 


Definition 11.8 


A linear mapping between Banach spaces is called compact when it maps 
bounded sets to totally bounded sets. 


Easy Consequences 


1. 
2. 


Compact linear maps are continuous (originally called completely continuous). 


If T, S are compact operators, then so are T + S and AT (since B bounded implies 
AT B and subsets of TB + SB are totally bounded (Proposition 7.13)). 


. The identity map J : X — X is not compact when the Banach space is infinite 


dimensional (it cannot convert the unit ball to a totally bounded set (Proposition 
8.23)). 


. It is enough to show that T maps the unit ball to a totally bounded set for T to be 


compact (since B C B,(0) => TB CrTBy). 
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Proposition 11.9 


If T is compact and S continuous linear, then ST and 7S are compact 
(when defined). 


If T,, are compact and T,, — T then T is compact. 


For a compact operator 7, imT is separable, and is closed only when 
finite-dimensional. 


Proof (i) Starting from a bounded set, T maps it to a totally bounded set and S, being 
Lipschitz, maps this to another totally bounded set (Proposition 6.7); or starting with 
a bounded set, S maps it to another bounded set (Exercise 4.17(3)), which is then 
mapped by T to a totally bounded set. 


(11) Let B be a bounded set, with its vectors having norm at most c. Then for any 
x €B,Tx =T,x + (T —T,)x, and 


(2 — Tr) xl] < IT — Trllllx | < el] — Trl > 0. 


Hence for n large enough, independent of x € B, ||(T — T,)x|| < €/2; in other 
words (T — T,)B © Be/2(0). Moreover T,, B is totally bounded and so, 


N N 
TBC TB +(T —T)B S|) Beya(xi) + Be20) = (J Bei). 
i=1 


i=l 


Thus 7 B is totally bounded and T is compact. 
(iii) Totally bounded sets are separable (Example 6.6(3)), so the image of T, 


imT =TX =T U B, (0) = U TB, (0), 


n=1 n=1 


being the countable union of separable sets, is separable (Exercise 4.21(3)). 

If im 7 is complete, then it is a Banach space in its own right. The open mapping 
theorem can be used to conclude that the unit (open) ball By is mapped to an open 
and totally bounded set TBy C imT. As 0 is an interior point of it, there is a 
totally bounded ball B,(0) Mim T C T By. This can only happen if im T is finite 
dimensional. Oo 


Examples 11.10 


1. Anoperator whose image has finite dimension (finite rank) is compact. The reason 
is that, in a finite-dimensional space, bounded sets are necessarily totally bounded 
(Proposition 8.23, Exercise 6.9(5)). For example, matrices and functionals are 
compact operators of finite rank. 
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2. » Acommon way of showing that an operator is compact is to show that it is the 
limit of operators of finite rank. 
For example let T : 0? —> €* be defined by T (an) := (a,/n). First cleave the 
operator to Ty defined by Ty (an) := (a1 /1,a2/2,...,an/N,0,0,...). This 
operator maps ¢7 to an N-dimensional space. Showing it is continuous would 
imply it is compact of finite rank: 


N N 

2 2 2 2 

Ziv (an) Il2 = > lan/n? < Do lanl? < Man) II32- 
n=1 


n=1 


Furthermore, Ty — T: 


[o.@) [o.@) 
1 
I(T — Tw) G@nligo = DY lan/nl® < yD) lanl? < Man ea/ N°. 
n=N-+1 n=N+1 


Hence ||T — Ty || < 1/N — Oas N > owas required. 


N 
3. Tn f(x) := >. f(nye2"'"* is an example of an operator of finite rank on 
n=—N 
L'[0, 1). 


4. p If T is a compact operator on Banach spaces and (x,,) is bounded, then (Tx,) 
has a convergent subsequence. 


Proof The sequence (T'x,) is totally bounded, hence has a Cauchy subsequence, 
which converges by virtue of the completeness of the codomain. 


An important source of examples of compact operators is the following: 


Proposition 11.11 


If the kernel k is a continuous function [a, b] x [c, d) > C, then the integral 
operator T : C[a,b] — C[c, d], 


b 
OV | EO, yy ojde 


is compact. 


Proof Let F be the unit ball of functions in C[a, b]. For any y € [c,d], and f € F, 
ITf(y)| < © - allo fllze < ( — a)llklize, 


so (T F)[c, d] is bounded in C, hence totally bounded. 
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As k is continuous on the compact set [a, b] x [c, d], it is uniformly continuous 
(Proposition 6.17). So for any € > 0 there is ad > O such that for |; — y2| < 6, 


b 
ITF —THODl< | Ik, v1) — KO, ya) IF @)| dx < e(b — a). 


This implies that Tf is continuous and, as 6 is independent of f, TF is equicon- 
tinuous. By the Arzela-Ascoli theorem (Theorem 6.26), T F is totally bounded in 
C[c, d], and the integral operator T is compact. oO 


Fredholm Operators 
Definition 11.12 
A Fredholm operator is one whose kernel is finite-dimensional and whose 
image has finite codimension. The index of a Fredholm operator is the differ- 
ence 
index(T) := dim ker T — codim im T. 
A Fredholm operator T : X — Y gives rise to decompositions 
X=kerT @M, Y=imTON, 
for some closed linear subspaces M, N by Examples 11.6(1, 2) and 11.4(1). The 


restricted operator R: M — imT, x + Tx is then bijective and continuous, and 
thus an isomorphism by the open mapping theorem. 


Proposition 11.13 Index Theorem 


The composition of Fredholm operators is again Fredholm, and 
index($7) = index(S) + index(T). 
Proof Let T € B(X,Y), S € B(Y, Z), both Fredholm, with n := dim(ker 7), 
m := codim(im S). Y decomposes as 
Y=N@imT=kerS@®M=A@OBOCOED 


where 


A :=ker SQN of dimension a, 
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B :=imT (ker S of dimension b, 
C := MQ N of dimension c, 
D:=MnNimT. 


Then dim ker ST =n +b, codimim ST = c+ ™m, both finite, and the index of ST 
isn+b—c—m=(a+b—m)+(n—a-—c) = index(S) + index(T). oO 


What is the connection with compact operators, one might ask? 


Proposition 11.14 


An operator T : X — Y on Banach spaces is Fredholm, if and only if T 
is invertible “up to compact operators’, that is, there exist K; € B(X), 
K»> € B(Y) compact and S € B(Y, X), such that 


ST Sabi, IS = Ish Ika, 


In fact, K,, K2 can be taken to be of finite rank. 


Proof Suppose T is Fredholm, so X = ker T © M, Y = imT @ N, and the map 
R:M — imT is an isomorphism. Let P be the finite-rank projection onto ker T 
with kernel M, and Q that projection onto N along im T (... QT = 0 = TP), and 
let S := | aaa 8 — Q). Then TR! =Tand R7'!T =1—- P, so for any x € X, 
yer, 


STx = R7'(1 — Q)Tx = R"'Tx = (1 — P)x, 
TSy =TR'(U- Q)y =(1- Q)y, 


soST=I—P,TS=I-@Q. 
Conversely, 


ker T C ker ST = ker(J + K1) =: M. 


For x € M, Kix = —x,ie., 1 = —Kj\|y compact, hence M, and thus ker 7, are 
finite-dimensional. Similarly, 


imT Dim7TS =im(/+ K2). 


Now, in general, the operator R := J + K has a closed image, for any compact 
operator K. For suppose there are vectors y, such that 


lyn + ker R|| = 1 AND Ry, —> 0. 
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The first condition implies that there are vectors v, € ker R such that 1 < ||uy|| < 2, 
where Uy := Yn — Un. AS K is compact and u, bounded, there is a subsequence wu, 
such that Ku, — y (Example 11.10(4)). Subtracting this from Ru, = Ry, > 0 
gives Um —> y. Consequently, Ru», converges to both Ry and to 0, that is, y € ker R. 
However, ym — Um — y then contradicts the given condition that y,, are at unit 
distance from ker R. It must be the case that there is a constant c > 0 such that 


ly + ker R|| < cl] Ry}, 


and im R is closed (Proposition 11.3). 
It should be clear that the map Y > Y/im R defined by 


yr Ky+imR=(y+Ky)—y+imR=—-y+imR 


is both compact and onto, hence it is of finite rank. This means that Y/im R, and by 
implication Y/im T, are finite-dimensional. Oo 


Exercises 11.15 


1. The multiplication operator (a,) > (bpd) (on e!, £7, or £©) is compact 
by, > 0. 


2. The operator V(a,) := (0, do, 41/2, a2/3,...) (on e!, say) is compact. But the 
shift operators are not. 


3. The operator Tx := x Gara for any ¢, € X*, yy € Y, is of finite rank. 
In the limit N — oo it gives a compact operator if yan Onl lyn || < 00. 
In fact, any operator of finite rank must be of this type Tx = 4 (@nxX)en with 
on € X* and ey a basis for im T. 


4. If S, T are linear of finite rank, then so are AT and S + 7; if S is any linear map, 
then ST and TS are of finite rank, when defined. 


5. No isomorphism between infinite-dimensional Banach spaces X, Y, can be com- 
pact. If T : X — Y is compact and invertible, then T~! cannot be continuous. 


6. If T : X — Y is compact, then so is its restriction to a closed subset M C X, 
TIu:M-Y. 


7. The index of anm x n matrix isn — m. 


8. The right-shift operator R (on £° say) is Fredholm with index —1; that of the 
left-shift operator is +1. The index of a projection is 0 when defined. 


11.3 The Dual Space X* 


Functionals provide very useful tools in converting vectors to numbers, and vector 
sequences to more amenable numerical sequences. Thus if we are uncertain whether 
X, — x then we might try to see if 6x, — x for some continuous functional—if it 
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does not converge, neither does x,,. Moreover, X* is a sort of mirror-image, or dual, 
of X: every vector in X can be thought of as a linear operator x : F > X,A b Ax, 
while functionals are linear operators ¢ : X > F,x + gx. It turns out that the 
space X* is at least as “rich” as the normed space X, in the sense that X can be 
recovered from X* as a subspace of X**. 


Examples 11.16 


1. The functionals of a Hilbert space are in 1-1 correspondence with the vectors by 
the Riesz representation theorem. 


2. Recall that ¢1* = €~, ¢* = 07, and cé = ¢! (Propositions 9.6, 9.9, and Theo- 
rem 9.3). 


3. We will see later that every functional on B(C’) is of the type 6T = tr(ST) 
where tr S is the trace of the matrix S (Theorems 15.31 and 10.16). 


4. (X x Y)* = X* x Y*, via the isomorphism (¢, ) +> @ where w(x, y) := 
px + Wy. 


5. For gi, v € X*, W()j_, keri =0 & WE ldi,.... dnl). 
Proof If Wker¢@ = 0, (¢@ # 0), then the map C > C, d(x) & W(x) is well- 
defined and linear, hence must be multiplication by some scalar A, i.e., y = Ad. 
Suppose w een ker @; = 0. On the space ker $n41, if dx = 0,7 = 1,...,n, 
then yx = 0, so by induction, y = >7"_, ajgj. Let € := w — D71_, aii; then 
dngix =O => Ex =0,50€ = ay 416n41 aS required. The converse is easy. 


Our first result concerning functionals is a powerful theorem which asserts the 
existence of a functional on the whole space X, starting from a “fragment” of it on 
a, perhaps much smaller, subspace Y. Like many existence-type theorems, the path 
to construct such an extension is not straightforward. 


Theorem 11.17 The Hahn-Banach Theorem 


Let Y be a subspace of a normed space X. Then every functional ¢ € Y* 
can be extended to some ¢ € X%, with |||] y+ = ||lly=. 


Proof Let us try to extend ¢ from a functional on Y to a functional @ on Y + [vl], 
for a vector v ¢ Y, by selecting a number gv := c. Once c is chosen, we are forced 
to set bly + Av) := dy + Ac, for any Av € [[v]], to make ¢ a linear extension of @; 
and to retain continuity with ||| = I|@||, we need, for any y € YandA € F(A £0), 


lpy + Ac| = |d(y +Av)| < llbIlly + Av 
& — |h(y/A) tel < llbllly/a + oI 
° Igy + cl < lldlllly + vl, (11.1) 
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(since the vectors y/A account for all of Y). To proceed, we consider first the case 
of real scalars and then generalize to the complex field. 


Real Normed Space: Let us suppose that ¢ is real-valued. Thus we are required 
to find ac € R that satisfies inequality (11.1) 


—gy — lolly + ull Se < —gy + llolllly + ull, Vy € ¥. 


Is this possible? Yes, because for any yj, y2 € Y, 


gyi — by2 < |O(y1 — y2)| < Wliilyn — yall 
< lolly + vil + lly2 + vip 
= Olllly1 + ull + llelilly2 + vIl 
> —oy2 — llOlllly2 + ull < —@y1 + lollilyn + VIL. 


Since y 1, y2 are arbitrary vectors in Y, there must be a constant c separating the two 
sides of the inequality, as sought. Choosing any such c gives an extended functional 
with |||] < |||] Gnequality (11.1)); but ¢ extends ¢, so |||] = II¢ll- 

Complex Normed Space: Now consider the case when the the functional is 
complex-valued. It decomposes into its real and imaginary parts 6 = ¢) + id¢2, 
but the two are not independent of each other because 


piliy) + iga(ty) = b(iy) = igy = idi(y) — 2) 


so that d2(y) = —¢, (iy). Being real-valued, they cannot possibly belong to Y*, but 
they do qualify as functionals on Y when restricted to the real scalars, 


oiQy1 + y2) = Re(o(y1) + (92) = 101) + $102), 
oiay) = Red(y) =Agi(y), VAER, 
ldi(y)| = |Redyl < ldyl < Ilollllyll 


(for @2, substitute Re with Im ). So they have real-valued extensions di toY+[v] 
that are linear over the real scalars; actually, extending ¢) to a automatically gives 
the extension for ¢. That is, define d(x) = d (x)-i d (ix). This is obviously linear 
over the real scalars since ¢y is. It is also linear over the complex scalars because 


Pix) = br (ix) — it (—x) = i(—igi ix) + $1(2)) = 1G). 
Moreover it is continuous since, using the polar form ox = |oxlel*, 
|x| = e'* bx = Ge x) = bie x) < drlllell = Hrilll-xll < ellie, 


so that al < ||@|l; in fact, equality holds because the domain of @ includes that 
of ¢. 
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Extending to X: If X can be generated from Y and a countable number of vectors 
Un, then @ can be extended in steps, first to some ¢; acting on Y + [[v;]], then to ¢2 
acting on Y + [[v;]] + [v2], etc. The final extension is then px := dy,x forx € X, 
whenever x € Y+[[v1,..., Un]]. If these vectors are only dense in X (e.g. when X is 
separable), d can be extended further with the same norm via d(x) := limy—+oo d (Xn) 
when x, — x, as a special case of extending a linear continuous function to the 
completion spaces (Example 8.9(4)). 

But even if X needs an uncountable number of generating vectors, then “Haus- 
dorff’s maximality principle” can be applied to conclude that the extension goes 
through to X. Let M be the collection of functionals dy acting on linear subspaces 
M containing Y and extending @ with the same norm 


M := {ou € M*:Vy €Y, uy = oy, AND |ldull = II¢ll }- 


By Hausdorff’s maximality principle, MM contains a maximal chain of subspaces 
{ Ma}, where ¢y extends dg whenever Mg C Mg. But E := is My also allows 
an extension of ¢, namely w(x) := ¢yx for x € My. It is well-defined because 
x € My Mg implies My © Mg say, so da x = gx. It is linear and continuous 
with the same norm as ¢, 


IWx] = lox! < Ilgall lll = Hollie. 


Hence w is a maximal extension in M; in fact, E = X, for were it to exclude 
any vector v, the first part of the proof assures us of an extension that includes v, 
contradicting the maximality of y. Oo 


Proposition 11.18 


For any x + 0, there is a unit ¢ € X* with gx = ||x||. 
More generally, if / is a closed linear subspace and x ¢ M, then there is 
a functional ¢ € X* with ||¢|| = 1, such that 


oM=0, ox £0. 


Proof If x #4 0, there are non-zero functionals on [[x]], such as w(Ax) := Ac (c 4 0); 
in particular, to satisfy the requirement ||¢|| = 1, choose @(Ax) := A||x||. By the 
Hahn-Banach theorem, it has an extension to all of X, with the same norm. 

More generally, given x ¢ M, form the linear subspace 


Y:=([e4J+M={aAx+a:rECaeM}. 


Y* contains the functional defined by y(Ax + a) := A||x + M|]. It is clearly zero 
when A = 0 and is linear and continuous since 
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WALX + ay + MAgx + a2) = (A + Ag) |x + Ml = WOALx + a1) + wy (A2x + a2), 
Ip(Ax + a)| = |Al|lx + M|| = ||Ax + M]| < |[Ax + al] 


and in fact || || = 1, 
Iw (x + an)I/Ilx + Gall = [lx + MII/Ilx + anl| > 1 


for ad, ©€ M chosen so that convergence of ||x+ay|| — ||x + M|| occurs 
(Proposition 8.18). So y% can be extended to a functional ¢ on all of X with the 
same norm. Oo 


The Hahn-Banach theorem and its corollaries show that there is a ready supply of 
functionals on normed spaces; admittedly, this does not sound exciting, but consider 
that there are vector spaces (not normed), such as L?(IR) with p < 1, that have 
only trivial continuous functionals. For our purposes, its greater importance lies in 
its ability to show a certain duality between X and its space of functionals X*. For 
example, the dual of the statement ||¢|| = sup |@x| is 

I|x[|=1 


Proposition 11.19 


|x|] = sup |@x|, ||7 I] = sup | Tx| 
Iel|=1 Ie l|=1=I|l] 
Proof |x| < ||x|| for all unit @ € X*. But the functional just constructed satisfies 


gx = ||x|| and |||] = 1, so supyg)—; |x| = [|x]. 
This in turn allows us to deduce 


|T|| = sup |[Tx||= sup sup |Tx]. 
Ijx|=1 [x= [lo |=1 


These identities generalize those for Hilbert spaces (Exercise 10.18(1)). 


Proposition 11.20 Separating Hyperplane Theorem 


If x € X does not lie in the closed ball B,-[0], then there is a ‘hyperplane’ 
¢~'w which separates the two, that is, 


df ¢ X*,Ja eR, VyeB,[0], |dy| <a < lox]. 


Proof Let @: [x] — F, @Qx) := Al|x||; its norm is 1 and gx = ||x|| > r. It can 
be extended to a functional on X with the same norm. Hence for any y in the closed 
ball, |@y| < ||@llllyll < r. The hyperplane is then Ax + ker ¢ where r/||x|| < A < 1. 

oO 
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Note that the proof remains valid when B,[0] is replaced by a closed balanced 
convex set C since C + B,(0) determines a semi-norm in which it is the open unit 
ball (Exercise 7.7(8)). 


Examples 11.21 
1. The Hahn-Banach theorem and its corollaries are evident for Hilbert spaces: 


(a) Any functional ¢ on a closed subspace M corresponds to a vector x € M, 
and hence has the obvious extension @ := (x, ) on H. 

(b) x =0 © Vy EA, (y, x) =0. 

K(x.y)| ly*x] 

It ~ SUPy*40 FT 


(c) |lxll = supyzo 
(d) One hyperplane separating x from B,[0] is x+ +ax,r <a < |x|). 
2. Operators do not extend automatically as functionals do: 


(a) If M is a complemented closed subspace of X, then every operator 
T : M — Y can be extended continuously to X > Y. 

(b) Ifthe identity map J on the closed subspace M can be extended to X — M, 
then M is complemented in X. 


Proof Let X = M@N with M, N closed subspaces, and define T(a+b):=Ta 
fora € M,b € N. Then ||T(a+5)|| = ||Ta|| < e|/T|||la + b|| Proposition 
11.5). 

If 7: X > M is anextension of 1: M > M, then 17x = lixs Ix, soitisa 
projection in B(X). X then splits up as ker 7 6 im J, where im J = M. 


3. If X is not separable then neither is X*. 


Proof Assume X* separable, with $1, $2, ... dense in it. By definition of their 
norm, there must be (unit) vectors x, such that for a fixed € > 0, 


IbnxXnl > (ball — [xn 


The claim is that M := [|x,]] is equal to X, making X separable. For if not, then 
there is a unit functional w € X™* such that yy M = 0; and there is a @, close to it, 


lv — dnl < €, so 
lonXnl = lw — bn) xnl < IW — dallllxnll < €llxnll- 


Combining the two inequalities yields ||@, || < 2¢, and this contradicts that ¢, is 
within € of the unit functional y. 


4. Banach Limits. The functional Lim on c (Exercise 9.4(1)) can be extended (non- 
uniquely) to a functional on 2%. 
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Annihilators 


Let us explore the duality between X and X* more closely. The connection between 
the two is the following construction, which allows us to shuttle between subspaces of 
X and those of X*. It is the generalization of the orthogonal spaces in Hilbert spaces 
which, under the Riesz correspondence J, can be rewritten in terms of functionals, 


At ={x€H: (x,a) =0, Vae A} —> [6 € H*: ga =0, Vac A}. 


Definition 11.22 


The annihilator of a set of vectors A C X is the set of functionals 


At :={@eX*:6x=0, Wx eA}. 
Similarly, given a set of functionals ® C X* then the pre-annihilator is 


@:={xEX:ox=0, VoeE O}. 


Easy Consequences 

1,0 XS, 

ACR > BCA. 
3.ACt+® & GA=06 OCA. 


The properties of A+ generalize those for Hilbert spaces, such as Proposition 10.9 
and Exercise 10.14(4). 


Proposition 11.23 


A+ is a closed linear subspace of X* with the following properties: 
(i) (AB) =A je and A= B eA Bs), 

Gi) +(A*) = [A], 

(iii) [A] is dense in X =& At =0. 


Proof That A+ is a linear subspace is evident from 
Vo,weAtacA AEF (+W)a=dat+pa=0, (Ad)a =hha =0. 


Let gn > ¢ with d, € A+; for any a € A, 0 = gna > ga, so € At and A+ is 
closed. 
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(i) Clearly, (A U B)+ is a subset of At and B+, while ¢A = 0 = @B imply 
@(AUB) = 0.If¢ € At, w € B+, andx € ANB, then (¢+y)x = ox+Wx =0. 


(ii) +(A+) is a closed linear subspace of X (Ex. 2 below), and it contains A, since for 
a € Aandany ¢ € At, ga = 0,s0a € +(A+). Thus [A] € +(A+) (Proposition 
7.10). 

Conversely, let x ¢ [[A]]. Then by Proposition 11.18, there is a functional ¢ 
satisfying both @[[A]] = 0, hence @ € At, and dx #0, hence x ¢ +(A+). 


(iii) Consequently, [[A]] is dense precisely when L(A+) = [A] = X, and this is 
equivalent to At = 0 (Vx € X,o@x =0 & ¢=0). oO 


The Double Dual X** 


A functional ¢ is an assignment of numbers x as the vectors x vary in X. Suppose 
we fix x and vary ¢ instead, @ +> x, what kind of object do we get? It is a mapping 
from X* to F, which is a possible candidate for a “double” functional in X**. 


Proposition 11.24 


For any x € X, the map x**¢ := ¢x is a functional on X*, and x +> x** is 
a linear isometry, embedding X in X**. 


Proof The mapping x** : X¥* > F,@ + @x, is clearly linear in ¢, and continuous 
with |x**| = |bx| < [[xIlllPll, te. x © X™. 

Hence we can form the map J : X — X**, defined by J(x) := x**. It is linear, 
since for any @ € X*,x, ye X,A EF, 


(x + y)*(¢) = o(« + y) = ox + by =x" (6) + yy"), 
(Ax)**(~) = @(Ax) = Agx =Ax**(¢). 


J is isometric by Proposition 11.19, ||x**|| = sup |x**@| = sup |x| = |x|]. 
loll=1 loll=1 
oO 


Examples 11.25 


1. Given any normed space X, the double dual X** is a Banach space. Hence the 
closure J X, being a closed linear subspace of X**, is itself a Banach space. It is 
isomorphic to the completion of X, denoted by X. 


2. Several Banach spaces, called reflexive spaces, have the property that the mapping 
x - x** is an isomorphism. Examples include £? (p > 1) and all Hilbert spaces 
(Proposition 10.16). 
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3. But in general, X need not be isomorphic to X**, even if X is complete. For 
example, some elements of (€')** are not of the type x** for any x € €!. 


4. In this embedding, A C A+ (since for x € A and g € At, x**¢ = ox =0, 80 
x** € At+), Note that A+ is always a closed linear subspace even if A isn’t. 
Question: if M is a closed linear subspace is it necessarily true that M = M++? 


5. Since a functional is determined by its values on the unit sphere, we can think of 
the double-functional x** as a continuous function on the unit sphere in X*; its 
norm is none other than its maximum value there, ||x** || = SUP} =1 |x|. Hence 
the vectors of any normed space can be thought of as continuous functions on a 
(possibly infinite-dimensional) sphere! 


Exercises 11.26 
1. X* distinguishes points: If x # y then there is a @ € X* such that dx 4 dy. 
. Ifx ¢ [Ly], find a functional on X with x = | and dy = 0. 


. For normed spaces, X* =0 & X =0. 


- WwW WN 


. Show that the functional ¢x := x, x € R, has many equal-norm extensions to R? 
with the 1-norm. 


Nn 


. The set {@ € X* : @x = ||x||} (for a given x) is a non-empty convex subset of 
X* (called the set of “tangent functionals” at x) 


. Show that if {x }4 = X* then x = 0, and if {x }+ = 0 then X = For X =0. 
Show +@ is a closed linear subspace of X. 


(+@)+ need not equal [| ®]]. For example, take ® := {6, :n € N}in oh. 


0 oO ND 


. Let M be aclosed subspace of a normed space X. The following maps are iso- 
morphisms 


M+ -> (X/M)* X*/M> > M* 


gry o+M~ +> lm. 
wx t+ M) ‘= ox, 


Hence, dim M+ = codim M and codim M+ = dim M, when finite. 


11.4 The Adjoint T* 


Recall the adjoint of an operator on Hilbert spaces T* : Y — X defined by the 
identity (T*y,x) = (y, Tx). Is there an analogous definition that can be applied 
to Banach spaces? First, one needs to recast the defining relation, replacing inner 
products by functionals, (T*y)*x = y*Tx. Although not exactly the same thing, 
the definition (T'¢)x := @Tx captures the essentials of this identity in terms of 
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functionals. The relation between them is T* : y H y* & Tly* = a* pa. 
More formally, using the Riesz correspondences Jy : Y > Y* and Jy : X > X*, 
Ps oo 1 dy. 


~~ =~ ¥ 


T 
T* is sometimes called the Hilbert adjoint to distinguish it from the adjoint T'. 


Definition 11.27 


The adjoint! of an operator T : X > Y is T' : Y* + X* defined by 
(T'o)x := o(Tx) for any ¢ € Y* andx € X. 


That T'@ : X — F is linear and continuous can be seen from 


T'o(xt+y)=oTat+y) =oTx+ oTy =T'O(X)+T'6() 
T' (Ax) = OT (Ax) = APT x = AT O(x) 
\(T'b)x| = |@(Tx)| < AIT xl < NOMI WI. (11.2) 


Proposition 11.28 


T' is linear and continuous when T is, and the map T +> T' is a linear 
isometry from B(X, Y) into B(Y*, X*), 


(StT)'=S'+T", QT) =aT", IT" =IITI- 


When defined, (ST)' = T'S'. 


' There is no standard name or notation for the adjoint operator. It has also been called the dual or 
transpose and denoted by various symbols such as T’, T*, T*, T* etc. 
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Proof Linearity of T': Forallx € X,¢,w € Y*,A €F, 
T'@+W)®) = @4+ W(x) = OTx 4+ WTx = (T'b)x + (T'W)x, 
T'(Ad)x = Ab Tx = (AT')x. 


That 7" is continuous follows from ||7'@|| < ||T'|| ||@l| by (11.2). 
The other assertions are implied by the following statements, true for all x ¢ X 


and all @ € Y*: 
(S+T)'ox = (Sx + Tx) = 6Sx+ Tx = (S'O+T'O)x, 
OT) 6x =oO0Tx) = 46Tx = OT'd)x. 


Using Proposition 11.19, 
||| = sup sup |@7Tx|= sup sup |(T'¢)x| = sup ||T' Ol] =||T' I. 
x=! [lpl]=1 @l=1 lx J=1 l@l|=1 
Finally, when T € B(X, Y), S € B(Y, Z), and any y € Z*, 


(ST)'W =WST =(S'WT =T'S'y. 


Examples 11.29 


1 0°=0,/7=1. 
2. The adjoint of a (complex) matrix is its transpose, with the columns becoming 


the rows, 6Tx = y-Tx =(T'y)-x, eg. 
C d 
yi). ae x1 2 ad ("). x] (386) = a 
(5)-625) (2)- (65) (8)-(8). « Gt)'=(E). 


and generally, >”, yj zy ie pay (>; Tiyan so Tj = Tie. 


3. » To find the adjoint of an operator T on the sequence spaces ¢!, ¢, or co, the 
effect of T ona vector x needs to reevaluated as an effect on a functional ¢, which 


recall is associated to a vector y in the dual spaces 2, ¢7, or £!, respectively, 
OlL= 7s Te] (7 pom 


For example, to show that the adjoint of the operator T(a,) := (a1, 0,0,...) in 
B(e!) is T' (bn) = (0, bo, 0, ...) in B(L™), consider 


y- Tx = (bo, bi, b2,...) + (a, 0,0, ...) 
= boa, = (0, bo, O,...) - (40, a1, a2, . 


.). 
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4. p> The adjoint of the left-shift operator is the right-shift operator, on £! or co: 


CO 
@Lx = y- Lx =) bpGn41 = ©, bo, bi, «-.) + (ao, 41, a2, ...) = (RY) +x. 
n=0 


5. The adjoint of the Fourier transform F : L'[0, 1] > co is F' : ¢! > L®[0, 1] 
defined by F'(a,) = >, ane 77'"* (Compare with F* Exercise 10.35(14)) 
Proof For y = (an) € £!, 


y-Ff= Yaw [2 fea ax 


= / (dane 2"™*) Fx) dx = (Fy): f 


n 


with the placement of the sum in the integral justified by >”, ae 
L°[0, 1]. 

* Note that £! c co, So the composition F'F is not defined on all of L'[0, 1], 
i.e., rebuilding an L!-function from its Fourier coefficients is not guaranteed to 
converge uniformly back to the function. However, with this machinery in place, 
it is now easy to prove part of Dirichlet’s assertion for periodic functions: 


FF: Cr, 1] > oo(Z) Cc C= L™[0, 1] (Exercise 9.27(3)). 
6. * Even if the codomain of T : X — Y is reduced to a linear subspace M such 
thatim T C M C Y, the image of T' remains the same. 
Proof Let T : X + M, Tx := Tx, be the new operator; then es aa we ae 
Any functional ¢ € M* can be extended to ¢ € Y%*, and for all x € X 
(1 o)x = Tx = oTx = (T"¢)x. 
Hence im T' C imT™. Conversely, any ¢ € Y* can be restricted to M, and the 
same reasoning shows the opposite inclusion. 


7. For a Hilbert space H, every operator T € B(#) is paired up with its adjoint 
T* € B(A). This fact makes B(H) much more special than spaces of operators 
on Banach spaces, as we shall see later in Chapter 16 on C*-algebras. 


The Hilbert space fact ker 7* = (im 7)+ generalizes to Banach spaces, but the 
closure of im T" is not always (ker T)+. 
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Proposition 11.30 (Closed Range Theorem) 


If X, Y are Banach spaces and T € B(X, Y), then 


ker T’=(imT)+, kerT =+timT'™, 


im =— ker? am 0 G(r 1) 


Moreover, im T' = (ker T)+ © imT is closed © imT' is closed. 


Proof The central statement is, for T € B(X, Y), 
oTx =(T'd)x. 


If these quantities vanish for all x € X, then the two sides of the equation state 
@ € (im T)+ and ¢ € ker T", which must therefore be logically equivalent. If they 
vanish for all @ € Y*, then they state x € ker T and x € +imT' respectively. 

We have already seen that ® C At & A C+; 50 the statements in the second 
line of the proposition follow from the identities in the top line, using first ® = 
ker T', A = imT, and secondly A = ker T, ® = imT™. Moreover (Proposition 
11.23), 


imT =+(imT+t) =1+(kerT‘). 


imT closed imT"' closed: Suppose imT is 


closed. We show that equality holdsinim T' C (ker T)+. 5s 
Let @ € (ker T)t, Le, 7x =O => ox = 0.T canbe 
considered as an onto operator T : X — imT, so the T ) 


mapping @ : Tx +> ox is a well-defined functional on 

im T (Exercise 11.7(4)). It can be extended to a functional imT F 
w € Y* by the Hahn-Banach theorem. b 

Then, for all x € X, 


ox = Tx = Tx =(T'W)x, 


so ¢ = T'w andimT' is equal to the closed subspace (ker T)+. 

Conversely, let im T' be closed, and define T:X>imT =: M, Tx := Tx; 
by Example 11.29(6) above and the fact that the annihilator of im T in M is 0, it 
follows that T" is 1-1 and has the closed image im 7". Hence, for all ¢ € M*, 
Toll > c|l¢|| (Proposition 11.3). Now C := T Bx is a closed balanced convex 
subset of Y, so by the separating hyperplane theorem, any y ¢ C can be separated 
from it by means of a functional w € Y%*, 
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Wx € BO], |WTx| <r <|pyl. 


Note that ||Ty|| < r. Then 
1 ~- r 
r< |wyl <Ivlilyll < <P vliiiyll < silyl 


and ||y|| > c. This implies that T Bx contains the ball Be (0). But we have already 
seen in the proof of the open mapping theorem that when this is the case, then T Bx 
contains some open ball B.(0) of M. This can only be true if T is onto, that is, 
im T =imT is equal to the closed space M. Oo 


Proposition 11.31 Schauder’s Theorem 
If T is compact then so is its adjoint T'. 


Proof Let T : X — Y be acompact operator, so T By is totally bounded in Y, that 
is, for arbitrarily small € > 0, it can be covered by a finite number of balls B.(Tx;) 
where x1, ..., x, € By. We want to show that 7’ maps the unit ball of functionals 
By» C Y* toa totally bounded set of functionals in X*. 

The linear map S : Y* > F% defined by Sw := (WTx1,..., WT xn) is continu- 
ous (because T is, and N is finite), so compact of finite-rank. Hence S$ By« is totally 
bounded in F™ and can be covered by balls Be (S w,) fora finite number of yw; € By-. 
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We now show that balls of radius 4e centered at T' yw; cover T' By». For any 


w © By« and any x € By, there are Tx; and Sy; close to Tx and Sw respectively, 
resulting in 


lWTx — wjTx| <|WTx — WT xi| + |WTxi — Wj Txil + |WjTxi — WjT x! 
SW Px — Trill + Sw — Syj lpn + wy llll Pi — TI 
< |lWlle +e + Ilvjlle 
< 3€ 

So Tw a TT yj| < 3€, and T' By« = U; Bae (T";). og 


Exercises 11.32 


1. 


10. 


The adjoint of a multiplier operator My(x) := yx, where My € B(e!), is 
My € B(e*). 


The adjoint of a finite-rank operator Tx := + (fx )éy is another finite-rank 
operator T'y = | (Wen)on- 
The adjoint is continuous: If 7, > T then 7, > T'. 


T maps a linear subspace M onto TM; show T' maps (7 M)+ into M+. So, if 
M is T-invariant, i.e., TM C M, then M+ is T™-invariant. 


* In the embedding of X in X**, show that T'' : X** > Y** is an extension 
of T : X — Y inthe sense that T''x** = (Tx)**. 


T' is 1-1 & imT is dense in Y; andimT' is dense in X* => T is 1-1. 
Let T € B(X, Y), 


T isanisomorphism < 7! isanisomorphism, with (T')~! = (T7!)', 
& T and T ‘are onto. 


A necessary condition for the equation Tx = y to have a solution in x is that y 
have the property T'¢ =0 > gy = 0. When is it also sufficient? 


If P is a projection, then so is P', with kernel (im P)+ and image (ker P)+. 
Deduce 
X=MON = X*=N* OM. 


If T is a Fredholm operator (Definition 11.12), then so is T' and its index 
is index(7') = —index(T). Moreover, index(T) = dimker T — dimker T'. 
(Hint: Exercise 11.26(9).) 
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11.5 Pointwise and Weak Convergence 


We have already encountered two types of convergence for operators T, € B(X, Y), 
to which can be added yet another, weaker, type: 


(i) Convergence in norm 
,_>?T @ |%,-TIl>9, 
(ii) Strong, or pointwise, convergence 


Tnx > Tx Vee X & |T,x-Tx|lly ~ 0 Vx eX. 


(iii) Weak convergence 
In—>T & 6%x—> Tx Vx eX, VoeY*. 


Examples 11.33 


1. » Convergence in norm is “stronger” than pointwise convergence, since for each 
xe xX, 
I| Tnx — Tx] = Tn — T)xI1 < Tn — T Illex] > 9. 


But the converse is false: it is possible to have pointwise convergence without 
convergence in norm. For example, let dy, : e! — C be defined by dy (an) = an; 
then dyx — 0 as N > oo foreach x € @!, but ||5y || = 1. 

Similarly, when defined on c, dy converge pointwise to Lim, since dy (ad,) = 
an — limpn-+oo dn yet dy *& Lim (because dy = en can converge only if ey 
converge in £'). 


2. Another example is the projection operator defined as n left shifts followed by 
n right shifts, J, := R’L” : €! > €!. It converges pointwise to the 0 operator, 
since for each x = (a;) € £!, ||R"L"x|| = >, |ai| > 0. However there are 
sequences, such as x := e,, for which T,x = x, so that ||T,,|| = 1 A 0. 


3. If 7, converge pointwise, T,x — Tx, Vx, it does not follow that Tt converge 
pointwise, 7,'¢ — T'd, Vo. For example, in ¢', L"x — 0 for the left-shift 
operator L; but R"x # 0 in 2°. Another example is 7,,(a;) := (dy, 0,0,...). 


It often happens that a map is defined as the pointwise limit of a sequence of 
operators, T(x) := limy_+. 7,x, assuming this is defined for all x € X. It is then 
natural to ask what properties does T enjoy: That it is linear is easy to prove, but 
is it also necessarily continuous? The answer is yes when X is a Banach space, as 
follows from the following stronger assertion: 
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Theorem 11.34 Banach-Steinhaus’s Uniform Bounded theorem 


For a Banach space X and 7; € B(X, Y), 
Whe EX, aC, = O, We, Axl < Ce) = We WAll< € 


(The index set of i need not be countable.) 


Proof The sets Ac := {x € X : Wi, ||Tjx|| < c} are closed, since if x € Ac 
and x», — x, then taking the limit m — oo in the inequality ||T;xm|| < c, we find 
|| T;x || < c, by continuity of 7; and the norm, showing x € Ac. 

The given hypothesis is that X = 72, Ax. By Baire’s category theorem, not all 
these sets can be nowhere dense. That is, there must be at least one N for which Ay 
contains a ball B,(a), in fact B,[a] since Ay is closed, 


VyeX, lly-all<r > Vi, ITiyIl<N 


Thus 

ly-all<r > ITiQ-Q@I| < lTiyl + Tall < 2N 
which can be rewritten as |" ll < > => 7; (= £)|| < 2N/r. But every vector x 
can be written in this form Tl = => * fora Saiabie y, so for all i, 


2N 
Tix < Ie. nO 


Corollary 11.35 


If T, € B(X, Y) with X a Banach space, and T,,x — T(x) for all x, then T 
is linear and continuous, 


|| 7 || < lim inf || Tn II. 


Proof T is necessarily linear, by continuity of addition and scalar multiplication 
(see the proof of Theorem 8.7). Any convergent sequence is bounded, so ||T;,x|| is 
bounded for each x, from which follows that Vn, || 7), || < C, by the uniform bounded 

theorem. 
If we now choose a subsequence of 7;,, for which ||7;,|| — @ := lim inf» ||Th||, 
and take the limit — 00 of || Tnx || < ||Tnl||lx||, we get || 7x] < @||x|| and ||T || < a. 
| 
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Examples 11.36 


1. The uniform bounded theorem can be restated as: If ||7;,|| — oo then there must 
be a vector x such that || 7x || — oo. 


* Tn fact the set of such x is dense in X: If any A; of the proof contains a ball, the 
conclusion of the theorem would hold; so with all A; nowhere dense in X, the 
complement of J; Ax is dense (Remark 4.22(1)). 


2. A common error is to define or prove Tx = > T,x for all x and then deduce 
T = >.,, In. It is true that two functions are the same, f = g, when f(x) = g(x) 
for all x € X, but the point is that the meaning of the limit in the sum >”, differs 
in the two expressions, the first occurring in Y and the second in B(X, Y). 


3. * Let Sy f = Ny f(ne2™"™, where x € [0, 1] is fixed and f € C[O, 1]. 
Show (a) Sy is a functional on C[0, 1], (b) Sy f = Dy « f, where 


N : 
sin(2N + 1)2x 
D = > 2minx = 
n@) ° sin x 
n=—N 
is called the Dirichlet kernel, and (c) || Sy || = || Dy ||,1. Assuming one can show 


that fo |Dn(x)| dx — oo, use the uniform boundedness theorem to deduce that 
there is a dense set of continuous functions f for which the Fourier series does 
not converge at x, Sy f — oo in C[0, 1]. 
Weak Convergence 
Let us now consider weak convergence of operators 
Tn >T & Tnx > OTx Vx € X, VEY". 
For vectors (considered as operators F > X,4 +» Ax), weak convergence takes the 
form 


Xn —>x & Ox, > ox, Vbe X*. 


For functionals (X — F), this convergence is called weak* convergence and coin- 
cides with their pointwise convergence, 


on —~ > dnx > x, Vx € X. 


One must guard against a possible source of confusion: the weak convergence of 
functionals, when thought of as vectors in X*, is different 


dn zap & WVdnr> Uo, Vee x™, 
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hence the need for a new name. 


Examples 11.37 


1. Strong convergence implies weak convergence because, by continuity of ¢, 
Tyx > Tx => OT)x > OTx. 


2. » But the converse is false in general: For example, in co, R” — 0, since for any 
x = (aj) € co, andy = (bj) € 21 = ons 


CO CO 
ly R"x| =| 0 bi¢nail < D2 [dilllx|| > Oasn > 00, 
i=0 i=n 


yet R’x A 0, 
| Rx || = |(0, ...,0,d0, a1, - Dlleg = \| || vad 0. 


3. To prove weak convergence, x, — x, given that (x,,) is bounded in X, itis enough 
to check wx, — wx for y in a dense subset of X*. 


Proof Any ¢ € X* can be approximated by functionals y, — @, by their density 
in X*. For yy, := X, — x (bounded), it is not hard to show that Wy, — 0, so 


by¥n = Wn¥n + ($-— Wn)¥n > O asn— ov. 
4. Weak convergence of vectors and operators in an inner product space become 


Xn —~ x > (y,Xn) > (y,x)asn > co, VyeE X, 
T, ~T @ (y,T,x) > (y, Tx)asn > w, Vx, ye X. 


5. In an inner product space, 
Xn — xX AND ||Xp|| > [|x|] @ x1 > x. 
Proof When xn — x, we get (x, X,) —> (x, x) since x* is a functional, so 
|x — xn? = lx? — 2Re (x, Xn) + [lxnll? > 0. 


Proposition 11.38 


In finite dimensions, all three convergence types are equivalent. 
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Proof Let A, — A where A,, A are M x N matrices. This means that for any 
@ € (F“)* and x € FY, 6(A, — A)x > Oasn > co. In particular if we let 
(=e) ,4=€ ; be basis vectors for F™* and F% respectively, then each component 
of A, converges to the corresponding component in A: 


An,ij = @} Ane; > é; Ae; = Ajj, asn > oo. 


This then implies that ||A, — Al] < F pay i mre |Anij — Aij|2 > 0 (Example 
8.9(2)). Oo 


The analogous result of the Banach-Steinhaus theorem for weak convergence is 
also true, but more care is needed: Although every convergent sequence is bounded 
(Example 4.3(5)), that fact was proved using a metric, whereas weak convergence 
T, — T is not equivalent, in general, to such a strong type of convergence as 
d(T,, T) — 0 for any distance function. 


Proposition 11.39 


If T,, —~ T where T,, € B(X, Y), X a Banach space, then 
(i) {T, :n € N} is bounded, and 
Gi) T € B(X, Y) with ||T|| < lim inf ||7;,||. 


Proof (i) Let T,, — T; the set { 7x, Tox, ...} is weakly bounded in the sense that 
for alln € N, @ € X*, |@T,x| < Cg.x, since (@T,x) is a convergent sequence 
in C. But an application of the uniform bounded theorem twice shows first that 
||T,x|| < Cx, and then that 7, is bounded. Of course, a simplified version of this 
argument applies equally well to weakly convergent sequences of vectors x, — x 
and to weak*-convergent sequences of functionals ¢, — @. 


(ii) Take the limit of @7, (x + y) = @T,x + T,y and $T,, (Ax) = A@T,,x to show 
linearity of T. Similarly, the set { ||Z;,|| : 7 € N} is bounded in R and possesses a 
smallest limit point a, so taking a subsequence of ||7;,|| which converges to it, we 


obtain 
\PTrx| < Ol Tallilxl| Wx e xX,6e€ Y* 


1 1 
|oT x| all || lll 


and ||7'|| < a follows. Thus B(X, Y) is closed under weak convergence. oO 


As a partial converse there is 
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Theorem 11.40 


When X is a separable Banach space, every bounded sequence in X* has 
a weak*-convergent subsequence. 


If x, x2,... € X are dense in the unit ball, then X* has a norm 


(oe) 


1 
Illw = Dy 5a lPrnl < Ilo 


nl 


such that for ¢, bounded, 


Pn ae v) ? lon — Olly = 0. 


Thus the unit closed ball of X* is a compact metric space with this norm. 


This theorem can be generalized to non-separable spaces (see [10]), when it is 
known as the Banach-Alaoglu theorem: The unit closed ball of X* is a compact 
topological space. 


Proof (i) Let {xm} be a countable dense subset of X, and suppose ||¢y|| < c. 
Then the sequence of complex numbers $x; is bounded, |¢@nx1| < c||x1||, and so 
must have a convergent subsequence (Exercise 6.9(6)), which we shall denote by 
P1.nxX1 > W(x1). This subsequence is also bounded on x2, |¢1,nx2| < ¢||x2||, and so 
we can extract, by the same means, a convergent sub-subsequence, $2 ,.x2. Notice 
that, not only does ¢$2,,x2 — wW(x2) but also ¢2.,x1 — wW(x1). Continuing this 
way, we get subsequences $,, and numbers (x) such that @mnxj > (xj), for 
i <m,and |Y(%m)| < cllxmll.- 


gn |o1 2 $3 4 bs... 
Pinl|O: $3 b4 65 ...| Ptnx1 > WO) 
2,n g1 $3 ps... $2,nX2 > W (x2) 


$3,n os ..-| $3,nx3 > W(x3) 
bkk\P1 3 «+ |OkkXm > W%m) 
Let we := k,~, a subsequence of the original sequence @¢,. In fact, Wy is a 


subsequence of every ¢m,n from some point onward (k > m), so WeXm > W(Xm), 
as k — oo. This implies that the function w is Lipschitz on the dense set { x» }, 


Gu) — WG p)| = jim [wax — vex) = Tim |Pe,eOu — xp)| < ella — xyll 
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and so can be extended uniformly to a continuous function on X (Theorem 4.13), 
and still satisfying |w(x)| < c||x||. Itis linear, as seen by taking the limit k — oo of 


W(x + y) = Wax + Vey and Wax) = AW Ex. 
Now, for any € > 0, there is an x,, close to x, ||x — x|| < €, so that 


AKeN,k>K => WkXm — Xml <€ 
> Wx — Wx] < [Wax — Wexm| + [Vem — WXm| + Xm — WX| 
< Qc + le, 


in other words wyx — wx for all x, or yy = w,ask > o. 


(11) That ||| ,,, is well-defined and bounded by ||@|| follows from |@xn| < ||O||||Xnl| < 
|||; that it is a norm follows from |@x, + Wxn| < |Oxn| + |Wxn| and |A@x,| = 
|A||Pxn|, as well as 


[ee 


1 
0 = [lly = Dy aq lanl < Va, Ib%nl = 0 > 6 =0 


n=1 
since { x, } is dense in By. 


(iii) When ||¢n|| < c,¢én ~ &@ & |lén— ll, — 0: It is enough to consider 
functionals ¢, such that ¢, — 0. Let € > 0 and M large enough that 1/ 2” < ©. For 
all m, bnXm — Oasn — ov; this convergence may not be uniform in m, but it will 
be for the first M points x1,...,xy,Le., 


AN, n>N => |dnXm| < €, Vn =1,...,M. 


So ||dnllw — 0, because forn > N, 


= i = | 
Pally = De 5m! Pntml < Dy amet Dy gy llGnllllemll < + oe. 
m=1 m=1 m=M+1 


Conversely, let 6, be bounded functionals such that ||@, ||, — 0. This implies 
that for any fixed m, 


(ee) 


1 1 
ani /Pnxml < D1 sz lPnxXml > 0, asn + 00, (11.3) 


m=1 


SO byXm — 0. For any x € X, choose x,, close to within € of y := x/||x||. This is 
possible because { x,, } are dense in the unit ball. Then, for n large enough, 


Ibn yl 
= |onx| 


ldnXm| + |bn(&m — y)| < € +E 
(1 + c)|lxlle 


IN IX 
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Hence $,x — 0 for any x and so d, — 0. 


(iv) By» is compact with respect to ||-||,,,: Every sequence ¢, in B,[0] has a weak*- 
convergent subsequence by (i), i.e., ||@n — @||,, — 0. For any x € X, 


lox| = lim |¢nx] < [alll < Ill, 
n—-> Oo 


so ||¢|| < 1, and B,[0] has the Bolzano-Weierstra8 property of compactness (Theo- 
rem 6.21). Note carefully, however, that it is not necessarily compact in the standard 
norm of X*; only in finite-dimensions are balls totally bounded (Proposition 8.23). 


Oo 


Examples 11.41 


1: 


In a Banach space, if x, — x and { x, :n € N} is totally bounded, then x, > x. 


Proof Every totally bounded set has a Cauchy subsequence, so xn, > y € X. 
By continuity, @x,, — dy = ox for any functional ¢, hence x = y. If x, # x, 
then one could find another Cauchy subsequence converging to y € x, which is 
impossible. 


. Asubset A C X is said to be weakly bounded when Vo € X*, #A is bounded. It 


turns out that A is weakly bounded <= A is bounded. 


Proof Given that |pa| < Rg for alla € A and ¢ € X%, recall that da = a**¢, 
where a** : X* — F, so the uniform bounded theorem can be used to yield 
llal| = lla“*|| <C. 

The idea of using functionals to transfer sets in X to sets in F is so convenient and 
useful that it is applied, not just to convergence, but to various other properties. 
In a general sense, we say that a set A C X is weakly P when for all 6 € X*, A 
has the property P. 


. A vector x is a weak limit point of a subset A when for any @ € X*, every open 


ball in F which contains ¢x also contains another point ga fora € A,a # x. Ais 
said to be weakly closed when it contains all its weak limit points. Every weakly 
closed set is closed, since x, > x => X, — x. 


. IfT islinearand @T is continuous foreach@ € Y* (i.e., x, > x => Tx, — Tx), 


then in fact T is continuous. 


Proof For every bounded set B, #T B is bounded by continuity. So T B is weakly 
bounded, which is the same as bounded. 


. A Hilbert space is weakly complete, i.e., if (@x,) is Cauchy in F for each ¢ € H%*, 


then x, — x for some x. 


Proof Let 6(y) := limy-soo (%, y); @ is linear and continuous by the uniform 
bounded theorem, so must be of the form ¢ = (x,-) and (xn, y) > (x, y) for 
each y. 
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6. * Closed and bounded sets of a Hilbert space are weakly sequentially compact, 
meaning any bounded sequence has a weakly convergent subsequence. 


Proof Let M := [[x1, x2,...]). Then M* * M, so Theorem 11.40 can be used 
to conclude that there is a subsequence x,, that converges weakly in M, i.e., 
(a, Xn;) — (a,x) for all a € M. But in fact, for any vector y € H and its 
orthogonal projection Py € M, (y, Xn) = (Py, x) > (Py, x) = (y, x). 


7. Ifx, —~x = Tx, —> Tx, then T is compact. 


Proof Yf B is a bounded set and Tx, any sequence in TB, then x, € B has 
a weakly convergent subsequence by the note above; by hypothesis its image 
converges, Tx,,; —> Tx, and is thus a Cauchy sequence in TB. T B is therefore 
totally bounded. 


8. * The “Least Distance Theorem” 10.11 can be generalized to when M is weakly 
closed. (Note that closed convex subsets are weakly closed.) 


Proof The sequence (y,,) of the theorem is bounded, hence has a weakly conver- 
gent subsequence yn; — ys € M. Moreover ||y, — x|| — d. Taking the limit of 
(Yn — X, Ye — X)1 < W¥n; — ¥Mlily* — XI1 gives |lyx — xl] <d. 


Exercises 11.42 
1. Show e, — 0 in co or £7, yet en & 0. 


2. (a) For é!, e?, and 2°, ifx, = (Qni) — x = (q;) then each component 
converges djj — aj aS n — oo. But the converse is false; e.g. e, A 0 in 
e 


(b) For £2, x, — x if, and only if, x, are bounded and each component con- 
verges, dy; —> a;. (Hint: approximate any ¢ by ae bje} .) Can you gen- 
eralize this to £? (1 < p < o@)? 


3. In L![0, 1], the functions f,(x) := e?7!"* converge weakly f, — 0, but not 
pointwise f,(x) “# Oat any x (see Theorem 9.25). 


4. » The weak limit of T,,, if it exists, is unique. A subsequence of 7, also converges 
to the same weak limit. 


5. » If x, — x then Tx, — Tx, for T € BCX, Y). 


6. Show that the norm is not continuous with respect to weak convergence, by 
finding a sequence in cg such that x, — x yet ||x,|| 4 ||x||. Similarly, the inner 
product of a Hilbert space is not weakly continuous: x, — x, yy, — y do not 
imply (X71, Yn) > (x, y). 

7. Ina Hilbert space with an orthonormal basis ey, 

(a) €n + 0, 

(b) >), @nen ~X SD, Unen > x. 
(Hint: The seriesis bounded, by Proposition 11.39, i.e., ||[aje1 +--+ + nen \| 
< cand so (a,) € 7; or use Example 11.37(5).) 
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10. 


11. 


12. 


13. 
14. 


15. 


16. 


on 7 > on — o. 

* Schur’s theorem: In £', weak convergence of x,, is the same as convergence in 

norm. Prove this as follows: 

(a) If the statement were false there would be unit x, = (ani) € €! such that 
Xn — 0. 

(b) For each n there is an N, such that SS ldnil > z. 


(c) Each coefficient converges to 0 asn — oo, so 


1 
Vk,IM,n >M => > |ani| < 2 


i<k 
(d) A subsequence of (x,,) exists with 
N, 
1 3 1 
p> ldni| < 5? > lani| > 57 Dleni| < 5 
1<Nn-1 i=Np-1 i>Nn 


(e) Let y := (\ani|/ani) € €°° where for each i, n is such that Ny»-1 <i < Np. 
Show |y -x,| > 5 to obtain a contradiction. 


Addition and scalar multiplication are continuous with respect to weak 
convergence, that is, if J, — T and S, — S then 7, + S$, —~ T + S, and 
AT, — AT. Of course, they are also continuous with respect to norm-wise and 
strong convergence. 


Multiplication of operators is not continuous with respect to weak convergence. 
The most that can be said is 

(a) if J, — T then 7,S — TS and ST, — ST, 

(b) if 7, — T and S,x — Sx for all x, then 7, S, — TS, 

(c) if Vd € X*, dS, > PS and T, — T then S,T, — ST. 

(a) For Banach spaces, if T,' — T' then T, — T (but not conversely). 


(b) For Hilbert spaces, if TJ, — T then 7,* —~ T* (weakly continuous). 
If T is compact then x, ~ x = Tx, — Tx (Hint: { x, } must be bounded.) 


If x, — x in X, then @ + (bx,) maps X* into c. For example, when X is el, 
this map converts bounded sequences to convergent ones. 


Every closed linear subspace is weakly closed (by Proposition 11.18). Thus, if 
Xn — x, then there is a sequence y, € [[x1, x2,... ]] which converges in norm, 
Yn > xX. 


A setin X* is weak*-closed when it contains all weak*-limit points; for example, 
At. 
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17. 


18. 


19. 
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The strong limit of unitary isomorphisms U;,, between two Hilbert spaces is an 
isometry U. But U need not be unitary; e.g. let U;, be defined on ¢? by 


Un (a1, G2, ---) = (An41, 41, 42, +++, An, An4+2, An43, +++). 
Then U,, converges strongly to the right-shift operator R. 


The Hadamard matrices are defined recursively by T; := ( 7 ), Thi c= 


T, T, : : 

@ r ) Sp i= Ty/2"/* are 2” x 2” unitary matrices; they can be extended 
n- +n 

to unitary operators on 2 by U,x := S,x when x € M, := [eo,..., e2»-1], 

and U,x := x when x € Mi, and then U, — 0. 


If a sequence of unitary isomorphisms U,, converges weakly to U, then ||U|| < 1. 
If U is known to be unitary, then the convergence is pointwise. (Hint: expand 
|]Unx — Ux||?.) 


Remarks 11.43 


1. 


Not every closed subspace of a Banach space need be “complemented”, e.g. the 
space 2° 4 co@ M for any closed linear subspace M (see Proposition 9.2 for the 
definition of co) (see [38]). Indeed there exist infinite-dimensional Banach spaces 
whose only complemented subspaces are the finite-dimensional or codimensional 
closed ones [42]. 


. It is a theorem that Hilbert spaces are the only Banach spaces in which every 


closed subspace is complemented [40]. 


. Weak convergence does not obey all the convergence properties of metric spaces. 


For example, not every weak limit point of a set M need have a sequence in M 
that converges weakly to it. 


. There are yet other types of convergence. For example, B(X, Y) is itself a Ba- 


nach space, and so there is weak convergence with respect to B(X, Y)*, meaning 
OT, — OT forall ® € B(X,Y)*. 


Chapter 12 
Differentiation and Integration 


12.1 Differentiation 


Although continuous linear transformations are stressed throughout the book—with 
good reason, for they are the morphisms of normed spaces—they represent, of course, 
a very special part of all the functions from one normed space to another. To put things 
in perspective, recall that the linear maps on R are x > ix, a very restricted set 
of functions in comparison with the non-linear real continuous functions. However, 
the linear maps are still relevant for one class of continuous functions: maps that 
are ‘locally linear’, meaning that they can be approximated by linear operators up to 
second-order errors: 


Definition 12.1 


A function f: X — Y between normed spaces (over the same field) is said 
to be (Fréchet) differentiable at x when there is a continuous linear map 
f'(x) € B(X, Y) such that for h in a neighborhood of 0, 


fa th) = f(x) + f'(x)h + o(h) 
where ||o(A)||/||2|| ~ Oash > 0. 
Note that f need not be defined on all of X but only on a neighborhood of x. The 


set of functions f: U C X — Y, where U is an open subset of a normed space X 
and f is differentiable at all points x € U, is here denoted D(U, Y). 
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Proposition 12.2 


The set of differentiable functions D(U, Y) forms a vector space. 


Differentiation D : f +> f’ is a linear map, which takes composition of 
functions to operator products, 


(f+gi=fi'tg, Af =af’, 
(Fog) =F GGiigt): 


Note that the domain of D is D(U, Y) and its codomain is the vector space 
of functions {g : U — B(X,Y)}. The last identity is called the chain rule of 


differentiation. 


Proof The statements follow from the following identities and inequalities: 
(ft+g(x+h)= fxth)+gat+h) 


= f(x) + fl )h + of(h) + g(x) + g/(x)h + 0g(h) 
= f(x) + ge) + (f+ 9) @)h + (of (A) + 0g(h)) 


Af (a th) = Af (x) + Af! (xh + Ao(h) 
fogx+h) = f(ga+A)) 


(g(x) + g/(x)h + 04(h)) 
(g(x)) + f/(g(x))(g/(x)h + 0g(h)) + 07 (A) 
(g(x) + f’(g(x)) 9’ a + CF" (g))og(h) + 0 (h)) 


| 
yw 


lor (h) + og(A)|| < llo¢A)|| + llogA)I, 
|Ao(A)|| = |Al|lo()|I, 
|Tog(h) + of (A)|| < ||T\Ilog(A) || + log (A), for any T € BCX, Y). o 


Examples 12.3 
1. The constant functions f(x) := yo are differentiable with f’ = 0. 


2. InR or C, the functions f(x) := x” are differentiable with 
f(xth)=(+h)" =x" +nx"'h+o(h), 


so f’(x) = nx"~!. Polynomials are thus differentiable. 


3. Continuous linear maps are differentiable, T(x +h) = Tx+Th,soT’(x) =T. 
A special case of the composition law is (T o f)’ = T o f’ when T is a fixed 
operator. 
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4. The derivative of F : R > R?, F(t) := C)) = fH(H+gO() is F’O = 


10. 


g(t) 
/ 
C : : A differentiable path r:R — X is called a curve. The direction of its 


derivative r’ is called its tangent. The arclength of acurveis [. ds := f |r’(t)| dt. 


. Define f: R? > R by f(x, y): = x? — y. Then f’(x, y) : R* > R is its 


gradient f'(x, y) = (2x, —1) since 
fxthy+k) = (+h)? -(y +h = (xe? — y) + (2x -1) ({) +0. 


ae map (h, k) > Ga — yo) + 2xoh — k gives the tangent plane to the surface 
= f(x, y) at the point (x0, Yo, ZO). 


. Areal inner product (-, -) : X? — R is differentiable, 


(x +h, y +k) = (x, y) + (x, k) + (A, y)) + (h, k). 


The middle term is linear in (A, k), and the last term is o(h, k) by the Cauchy- 
Schwarz inequality, 


(A, ky _WAIIKTI 
All WAI © All + TAT 


< ||A|| > Oas (h,k) > (0,0). 


. We often write D, f(x) := f’(x)v. Note that 


Dyiwf = Dy f + Duf, Diy f =ADvf. 


Because of this last property, v is usually taken to be a unit vector. 


When X = R there are only two unit vectors, v = +1, and the notation used is 
o := Dy, for the derivative in the positive direction. Similarly, for C, é i= Dj. 
In R , the standard basis consists of N unit vectors e,, and we define 3), := De,- 


. For X = R, the derivative can be taken to be a function f’ : R — Y, since 


B(R,Y)= 


. > Differentiable functions are continuous in x, in fact are Lipschitz in a neigh- 


borhood of any point 


If) — FOI = IFC — x) + o(y — x)Il < elly — x1) 


In particular, f(y) — f(x) as y > x. But there are Lipschitz functions, such 
as x +> |x| on R, that are not differentiable. 


* The set of functions f € C(R) with bounded continuous derivatives, f’ € 
C(R), is denoted by C!(R). It is anon-closed linear subspace of C (R). However, 
it can be given a complete norm 


Ifllct = Mflle +I f'lle: 
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Differentiation is then an operator C!(R) > C(R). 


Proof The functions sinnx have unit norms in C(R), but their derivatives n cos nx 
have arbitrarily large oo-norm. This allows us to define 


foe), = sin 4” x 


n=1 


with the partial sums fy converging in C (R) (the series is absolutely convergent). But 
this is an example of a nowhere-differentiable function (check it is not differentiable 
at 0 at least), so although fy € C!(R) and fy — f uniformly, f ¢ C!(R). 

C!(R) is complete: if (f;,) is a Cauchy sequence in C!(R), then (fn) and (f,,) are 
Cauchy sequences in the complete space C (IR), so they converge to f, g € C(R). By 
Proposition 9.18, f’ = g and f, > f inC!(R). 

That differentiation is continuous is trivial for this space: 


IDflle =IIf'lle < Iflle +l f'lle = Wille 


It is not continuous when C! (R) is considered as a subspace of C(R). 


Proposition 12.4 


The kernel of D on D(X, Y) consists of the constant functions, 


Df =0 => f isconstant. 


Proof We first identify the kernel when the differentiable functions are real, g: R>R. 
Suppose g’(t) = 0 for all ¢ € [a, b], and let 


(t — a)g(b) + (b — tg@) 


Git) = g(t) — 


also differentiable, with G(a) = 0 = G(b), and 


G(t+h)—G(t) = G'(HNh+o(h) = 10) = FO + oh, t € Ja, bf. (12.1) 


G is continuous on the compact set [a, b], so it must have maximum and minimum 
points. We can assume one of them to be inside Ja, b[, for if they are at a and b, then 
trivially G is 0 throughout [a, b]. 

Now, on any minimum of G within Ja, b[, as h changes sign from negative to 
positive, G(to + h) — G(to) remains positive; on a maximum it remains negative. 
From (12.1), this can only hold if g(a) = g(b). As a and Db are arbitrary, this shows 
that g is constant. 
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For f’ = 0 on X, we can use functionals to reduce it to a real-valued function: 
let g(t) := go f (tx) for any non-zero x € X and ¢ € Y%. It is differentiable, 


gt +h) = bo f(tx +hx) = $( f(x) + o(f (tx)hx) + o(hx) = g(t) + o(hx), 


with derivative g'(t) = 0. By the first part, g(t) = g(0) = ¢ 0 f (0) constant. But 
with @ and x arbitrary, this shows that f = f(0), a constant function. oO 
Exercises 12.5 
1. Show that for differentiable functions A : R— F, f,g: R > X¥,T:R—- 
B(X,Y), 
a) FAOLO) =VOLO +af'O, 
(b) (Ag =(f. 9 +(h9), 
(©) FFM. 90) = AF (FO. GO) f+ 2FFO. 9O)9'O. 
@) FTOSO=TOFO+TOS(O. 
2. For a curve on the sphere r : [0,1] — S?, the tangent ¢ at any point satisfies 
t-r=0. 
3. » For a differentiable function y : RY — R™, y’ is the Jacobi matrix [0; y i]. 


4. The derivative itself, f’(x), need not be continuous in x. For example, show 
that f(x) := x” sin(1 /x) (and f(0) := 0) is differentiable at all points, yet its 
derivative is not continuous at 0. 


5. If f: X — R is differentiable and has a maximum/minimum at x in some open 
set U C X, then f’(x) = 0. 


6. L’Hopital’s rule: If f: R > X,g: R > F are differentiable functions satisfying 
f(a) = 0, g(a) = 0, but g’(a) 4 0, then 


fe) f'@ 
n — = a 
xa ga) g(a) 


12.2 Integration for Vector-Valued Functions 


The construction of L'(R) can be extended to include functions f: R — X, where 
X is a Banach space, as done in Section 9.2. Briefly, 


e a vector-valued characteristic function xl~ maps t tox € X whente ECR 
and to 0 otherwise; 


e asimple function is a linear combination of vector characteristic functions on sets 
of finite measure, in which case, 
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N N 
ps 1g,%n = > Ate 


n=l n=1 


The set of simple functions is a normed space with ||s || := f ls (t) lly dt. 


e afunction f: R— X is integrable when it is the ae-limit of a Cauchy sequence 
of simple functions s, > f a.e., ||s, — S|| > 0asn,m — oo; its integral is 


| ge lim | Sp. 
no 


e ona measurable set A CR, [, f := f fla, eg. fF dnd 1088 <b. 


Quoting the results of Section 9.2, 


Proposition 12.6 


For f, 9g: R > X integrable, 


@ fftg=fft+fg fafHaff AaeP, 
Gi) IS fil < flf@Olde, 

(ii) faA@x dt = ([A)x fora € L'(R), x € X, 

Gy fi p= 7 ff tor’ = BC, ¥). 


Examples 12.7 


+ JB) = [0G +0) «= (ora em 


1 
‘ — lt _fi 1/72 
integrable. Similarly, [ € 5) dt = ee ia) : 


2. Any continuous function f: [a,b] — X is integrable, since ie I f@|| dt < 


(b—-a)|lf lle: 
3. If fa(t) > f(t) in X, uniformly in + € [a,b], then [” f, > f? f in X, since 


b b 
| (fn - Fl < | fn) — FOI dt < lfn — fllzfa,5) — 2). 


The connection between differentiation and integration is one of the cornerstones 
of classical mathematics. It remains valid for vector-valued functions: 
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Theorem 12.8 Fundamental Theorem of Calculus 


If f: [a,b] — X isintegrable, and continuous att € |a, b[, thenits integral 
is differentiable at t, and 

d ip 

— = (@)- 

a | f=f@) 


If f’ : [a,b] > X is continuous, then 


b 
/ f= f(b) — f@. 


Proof (i) The first part is a consequence of 


i) = 0 “f+ fOh+ ( i ie rion) 


1 t+h t+h = 
I; [0 soar-sol=| ff Pay 
t t 


| hI F(t) — FOI dr| 
; [h| 
< ali dr| =« 


and 


JIN 


for arbitrary € > 0 and |h| sufficiently small, since f is continuous at ft. 


(ii) For the second part, let F(t) := ie f'. By (i) we obtain F’ = f’ on Ja, b[, so 
their difference F(t) — f(t) must be aconstant c. As F(a) = 0,c = — f(a). oO 


Proposition 12.9 Mean value theorem 


For a continuous function f : [a,b] > X, 


il b 
al f(t 


belongs to the closed convex hull of f[a, b]. 


Proof The function is uniformly continuous (Proposition 6.17), so splitting [a, b] 
into small enough intervals [t), ty+1] of size h = (b — a)/N each (t) := a+ nh), 
ensures that || f(t) — f(t’)|| < € whenever f, t’ are in the same sub-interval. This 
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means that f can be approximated uniformly by a simple function which takes the 
value f(t,) on the interval [f,, t,+1[, and its integral rh f can be approximated to 
within €(b — a) by the sum 


(f(a) + f(t) tee + f (tw—1))h. 


Thus ms a f is within € of (f(a) + f(a +h)+---+ f(b—h))/N which belongs 
to the convex hull of f[a, b]. Since € is arbitrarily small, the result follows. oO 


Corollary 12.10 


For a continuously differentiable function f : [a,b] > X, 


f(b) — f(a) 
b—-a 


belongs to the closed convex hull of /’[a, 5]. 


Proof 


fb)-f@ 1 [ / 
b-a b= ad," 


Recall that f’ is a function U + B(X,Y); it may itself be differentiable, 
with derivative denoted by f(x) € B(X, B(X, Y)). This Banach space is actu- 
ally isomorphic to the space of bilinear maps B(X?,Y) via the identification 
Tx,X2 = T(x1, x2). Because of this, f’(x) is akin to an operator that converts a 
pair of vectors of X into a vector in Y; in particular, f”(x)(h, h) makes sense, and 
is often shortened into the form f”(x)h?. 

More generally, f” is the nth derivative of f: it takes n vectors in X and outputs 
a vector in Y. The set of n-times differentiable functions f: R > X, with f™ 
continuous, is denoted by C”(R, X). 


Theorem 12.11 Taylor’s theorem 


For f ¢ C"(R, X) m7 =1,2,...), 


oe (t)h” 


(ICSD) OIE = 20 


+ o(h"). 
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Proof As expected the proof proceeds by induction on n. To illustrate the idea behind 
the inductive step, we only consider how the statement for n = 2 follows from that 


forn = 1. Let f € C?(R, X), and let 


F(s) := fit+s)-fO-—f'Os — f’ Os? /2! 


We wish to show F(h) = o(h?). F is continuously differentiable in s because it 
consists of sums and products of continuously differentiable functions, in fact 


F'(s)= fit+s)— f'O— f" Os = os), 


since f’ is differentiable. Using the above corollary, it follows that FO-FO) belongs 
to the closed convex hull of F’[0, 1], whose values are at most of order o(h). Since 
F (0) = 0, we have F(h) = o(h?) as required. 

The reader is invited to adapt this proof to show that if the statement is correct for 
n then it is also true form + 1. The case n = | is, of course, part of the definition of 
the derivative. oO 


Exercises 12.12 


1. Integration by parts: [° f (t)F'(t) dt = [f F\2—f? fF) dt, where f : R > 
F and F: R — X have continuous derivatives. 


2. Change of variables: i f(x) dx = a Fy) dy, where y : R > R has an 


invertible continuous derivative, and F(y(x)) = f (x). 


3. If f: [a,b] — M is continuous, where M is a closed linear subspace of X, then 
b 
JO f eM. 
a 


4. The symbol o(h) satisfies ||o(h)|| < c||h|| for A small enough, but not necessarily 
\|jo(h) || < cllA||?. However show that the latter inequality is true if f’(y) is 
Lipschitz in y in some ball about x, by evaluating 


ld 
if q_ietm — f'(x)h dtl]. 
0 at 


Application: The Newton-Raphson Algorithm 


If it is required to find a vector x which solves f(x) = 0, where f is differentiable, 
we might start with a first estimate x and find a better approximation from 


O= fxth)& fx) + f'@h, 


namely h = — f’(x)~! f(x). This suggests the following iteration: 
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Proposition 12.13 The Newton-Raphson Method 


Let x be a zero of f and suppose that in a neighborhood of x, f is dif- 
ferentiable with f’(x) Lipschitz in x and || f/(x)~!|| < c. Then if xo is 
sufficiently close to x, the iteration 


Se = in — f One Fa) 


converges to x. 


Proof The differentiability of f at x states that for h = x, — x, |h| <€ 


f(&+h) = f(R) + f’ Qh + off), 
fn) = flOndh + (f'%) — f’On))h + oth), 
if (Xn) FG) = = Xn —X+ f’Gn)~ : ((f" (x) — f’On))h+ o(h)) 


3ck 2 
IIxnz1 — ¥]) < EAI? = ellxn — £17, 


where k is the Lipschitz constant of f’ and |lo(h)|| < 5k I|All (Exercise 12.12(4) 
above). If € < 1/c then it implies firstly that if x, belongs to B.(x), then so does 
Xn+1, and secondly by induction it follows that ||x, — x|| < (¢|lxo — x II)?" /¢c>0 
asn — OOo. o 


This method is very effective since it converges quadratically, as long as xo and 
x are already close enough. In practice, other algorithms are utilized to perform a 
broad search for a zero, and Newton’s method is then used to rapidly home in on it. 
Another caveat is that it may be computationally expensive: one has to calculate not 
only the derivative f’(x) but effectively also its inverse. The methods that are most 
often used employ modified iterations like xXn41 := x, — Hn f (xn), where H, are 
operators that approximate f’(x,)~! but are easier to calculate. 


Examples 12.14 


1. To solve for e’* = 1 close to z = 6, Newton’s iteration can be applied to f(z) := 
eZ — 1, 


Xnt1 = Xn +i(1 —e7) 
x0 6 
x, 6.27942 + 0.03983: 
x2 6.28334 — 0.00080: 
X3 6.28319 — 0.0 


Examples of other equations whose solutions are routinely found using this 
method are (a) roots of polynomials e.g. x* = 2, (b) transcendental equations 
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such as x — sinx = | or x tanx = 1, (c) simultaneous non-linear equations, 
eg.x>-+y=lx+y*=2. 


2. The method can be used to find the minimum of a scalar differentiable function, 
say on R?, which is equivalent to finding zeros of its derivative. For example, if 
the function were exactly quadratic 


1 
f(x) =at+b-x+ 5x! Ax 


then the minimum occurs when Ax + b = 0, and Newton’s method finds the 
minimum point in one step: x; = —A~'b. The more undulating a function is, the 
more demanding it becomes to find the true minimum. Two challenging functions 
that have served as benchmarks are the following 


(a) (1 — x)? + 100(y — x”)? (Rosenbrock’s valley), 
(b) (x7 + y— 11)? + (« + y* — 7)* (Himmelblau’s function). 


3. * To align two real-valued functions f and g as best as possible, one may find a 
that minimizes ce (f(x +a) —- g(x))* dx. Expanding this out, then differentiating 
in a, gives 


[fe — 909? +20F) - geo fa + (Fa)? 
+ (f(x) — g(x))a! f" (x)adx + 0(a’) 

» fre -gens'eo +." @as'co 
+ (F(x) — (x) f"()a dx + 0(a) = 0 


The Newton-Raphson estimate of a is 


a=-(Uf. f")+(f-9. f") fF -9. f')- 


Letting fna4i(x) := fn(x +a), fo(x) := f(x), and iterating aligns the two 
functions. (You can try this out with f(x) = cosx and g(x) = cos(x + 1) over 
the interval [0, 27r].) This method has been implemented to align images (when 
x,a € R*), for example to compensate for video camera jitter from one frame to 
the next. 


12.3 Complex Differentiation and Integration 


Let X be a complex Banach space, then a differentiable function f: C > X is also 
called analytic, i.e., for all z, h, 


f(z th) = f(z) + f'(@h+o(h). 
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The set of functions f: C — C which are analytic at all points z in an open set 
U 2 A, is denoted by C®(A). 


A function f: C > X is integrable along a differentiable path w : [to, t)] > C, 
when the composition f o w : [fo, t1] ~ C — X is integrable. Its integral is then 


ty 
[ fou = f (w(t))w’'(t) dt. 
w to 


Notice that dz/i is along the normal to a path. Proposition 12.6 remains true, for 
example property 2 becomes 


/ f (z) dz| < [iro ds, where ds := |w’(t)| dt. 


Examples 12.15 


1. Along any curve w which starts at w(0) = a + bi and ends at w(1) = c+ di, 


c+di 1 
fl ldz= | w'(t)dt = [w(t)]h = iy. 
a 0 


+bi 


More generally, [° He f'@)dz = i f'(w(t))w'(t) dt = Fics for f ana- 


lytic (with f’ continuous). Thus one can integrate analytic functions in the same 
manner as real-valued functions. 


2. The map z b> 1 is analytic except at = 0. On a circular path w(t) := re'’, 


O0<t<2z, 
1 20 1 , ; 
|; dz = | -e “ire dt =2mi 
0X 0 r 


(independent of the radius). Thus the integral is is ! dz does not have a unique 
answer, but depends on whether one traverses a path that passes above or below 


the origin, and how often it loops around it. But otherwise / — dz = 0. 
Zz 


fe} 


3. Cauchy-Riemann equations: An analytic function f: C > C,x +iy & 
u(x, y) + iv(x, y) satisfies the equations 


ou dv ou dv 
ox oy 


since f’(z) = i +i ae a gu if , which can be obtained by comparing 


f(Zth) =u(x,y)+ ah +iv(x,y) +i a o(h) = f(z) t+ f'(—h t+ oh), 


a 
FE ih) = u(r y) + Eh + vl y) HIE o(h) = f(z) + f' (ih + o(h), 
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4. The conjugate map z +> Z is not analytic, z +h = Z+h. Therefore, Re(z) = 
(z+z)/2, Im(z) = (z—Z)/2i, and |z| = zz, are not analytic. Indeed the Cauchy- 
Riemann equations can be written symbolically as ue = 0, and interpreted as f 
being independent of z. 


Cauchy’s Theorems 


Analytic functions f: C — X are profoundly different from the similar-looking 
functions f: R? — X that are simply differentiable over the reals. This is borne out 
by a string of results discovered by Augustin Cauchy in the 19th century. We will 
only present here the essential theorems (See [20] for a more thorough presentation). 


Theorem 12.16 Cauchy’s Theorem 


Let 2 Cc C be a bounded open set having a finite number of differentiable 
curves as boundary. Let f be a function from C into a Banach space, which 
is analytic on and in Q, then along these boundary curves, 


$ fac =0. 


Warning: the curves must be traversed in a consistent manner, say with the region 
Q to the left of each curve. A fully rigorous proof requires results that are too technical 
to be presented in a simplified form (see [10]). These details will be disregarded in 
favor of a more intuitive approach, both for this theorem and its corollaries. 


Proof At any analytic point, f(z +h) = f(z) + f’(2)h + o(h), where o(h)/h > 0 
as h — 0. So for any € > 0 and |h| < 6 small enough, we have ||o(h)|| < €6. For 
any closed curve LJ inside a disk Bs(zo) © Q we get, using Example | above, 


| soraw = [ peo+ oe 


= i f (zo) + f’ (zo)z + 0(z) dz 


= reo) | Lae + f'o) f cde+ | o(z) dz 


=) o(z) dz 


he Lf f(w) dw| < | lo(h)|| ds < €5 x Perimeter(L) (12.2) 
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Each point z) € Q might need a different 5, but since Q is compact, there is a 
minimum 6 that works at all points (as in Proposition 6.17). 


fe 
£1! 


The region Q can be covered by an array of squares of side 5, as shown in the 
diagram. The integral on the boundary dQ can be split up into a sum of integrals 
along the squares that are within 02, except that when a square intersects the boundary 
dQ, the integral is partly along the square and partly along the boundary. Each tiny 
loop has perimeter at most 46 + /, where / is the length of that part of the boundary 
curve which lies inside the square. 

If Q is enclosed in a square of side L, there are at most (L/6)* squares in all, so 
the sum of the integrals is at most 


| feoraull < OU [Few aol 


7 


YT TTY , 
CTT Te 
CCC Ty 


4 
KH 


< >) <6(45 + 1;) by (12.2) 


i 


< (41? + Perimeter(2)6) € 


With € arbitrarily small, the integral must vanish. Oo 


Corollary 12.17 


If f is analytic in the interior Q of a simple closed curve w, then the integral 
Wis J (z) dz is well-defined when a, b € Q, independent of the path taken 
(within Q). 


Proof Any two paths inside Q, from a to b, together form one or more simple closed 
paths, inside which f is analytic. Hence the integral of f on this closed loopis 0. O 


One of the surprising results of Cauchy’s theorem is that the value of the integral 
¢ f(z) dz is independent of the bounding curve itself, but only on interior “distant” 
regions! 
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Augustin Louis Cauchy (1789-1857) studied under Lagrange 
and Laplace as a military engineer, but decided to continue 
with mathematics. A staunch royalist, he replaced Monge at 
the Académie des Sciences after the fall of Napoleon. Although 
he published important papers in the fields of elasticity and 
waves, he became famous for his taught courses on analysis and 
calculus in the 1820s, in which he proved the diagonalization of 
real quadratic forms and pushed forward the new standards of 
rigor, e.g. limits, continuity, convergence. 


Fig. 12.1 Cauchy 


Corollary 12.18 Cauchy’s Residue Theorem 


The integral over a closed simple curve depends only on those regions 
inside where f is not analytic, 


1 4 
sf f@dz= pene”) 


Proof Enclose the non-analytic parts by a finite number of curves w;—the outer 
boundary curve y already does this, but it may be 
possible to further isolate the non-analytic parts— 
to form one analytic region, around which the inte- 
gral is zero, 


de 


traversing each curve w; in a clockwise direction. 
The value of the integral around each non-analytic 
region in a counter-clockwise direction may be 
called a ‘residue’ of f. Oo 


Because of this, the integral around a closed simple curve is often denoted by 
fg Ff (z) dz, without reference to the (counter-clockwise) path taken, as long as it is 
clear from the context which non-analytic regions are included. 

The simplest cases in which a function fails to be analytic are of isolated points, 
called isolated singularities. An example of an isolated singularity a is a pole of order 
n when the function is of the type f(z)/(z — a)” with f analytic in a neighborhood 
of a and f(a) 4 0. A simple pole is a pole of order 1. All other isolated singularities 
are called essential singularities. We shall see later that the residue of a function at 
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a pole of order n + 1 is f)(a)/n!, but what can be proved here is the case for a 
simple pole: 


Proposition 12.19 Cauchy’s Integral Formula 


If f: C — X is analytic inside a simple closed path that contains a, then 


Ae af coe 
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Za 


Proof The integrand f(z)/(z — a) is analytic except at z = a, so by Cauchy’s 
theorem the path of integration can be taken to be a small circle of radius r about a. 
As f is analytic ata, we know f(a+w) = f(a) + f'(a)w + o(w), so 


f@) _ fat) _fO | pay 4 


o(w) 
Za w w w 


Integrating around a closed simple path eliminates the constant function f’(a), and 


1 1 20 
gu dz| < | (Ow de <re 
20i w 2m Jo |w| 


if r is small enough that |o(w)|/|w| < €. Thus in the limit as we take smaller circles, 
only the term 5+ $ £@) dw = f(a) is left. o 


Examples 12.20 


1. Interpreting the residue theorem in actual examples 
often yields results that would be harder to obtain 
otherwise. For example, the function e!®/z has a 
simple pole at 0 with residue 1. So using a contour 
as shown in the diagram, we obtain 


R ex —r ex TU . oe 2n «46 
ani = | Sax / <ax+ [ eRine—iems); ag 4 f el’ ido 
r x —-R x 0 Tu 


As R — o andr — O, the imaginary part 2 i sina dx + 7 converges to 277, 


; ; ® sinix 
to give lim —dx=7/2. 
ra r XxX 
R-0o 


2. Maximum modulus principle: If f : C + C is analytic and has a local maximum 
(or minimum) at a, then f is constant in a neighborhood of a. It follows that on a 
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compact subset K, | f| attains its maximum and minimum at the boundary of K. 


Proof Using a circular path of any radius r, 


1 1 20 . 
@l=|--¢ 2 a <5 f itta+re)1ae < [Pca 
Ti Z-—a 2 Jo 
20 
7 i If@|—|f@1d0 =0 
so | f(z)| = |f(@| within the disk, which in turn implies f(z) is constant 


(Exercise 12.21(4)). Let f~!M be the subset of the interior of K where || 
attains the maximum M := maxzexe | f(z)|. It is open by the above, and closed 
in K° ((Exercises 3.12(11))), hence must contain whole components of K°, unless 
empty. By continuity, f takes the same value M on the boundary. 


. We say that a function f has a zero of order n at a when f(z) = (z — a)"g(z), 


with g(a) € 0, g analytic in a neighborhood of a. 


If f: C > C has a zero (or pole) of order n at a, then f’/f has a simple pole at 
a with residue n (resp. —n) 


f'@ _ n@- a)"~"g(z) + (z —a)"g'(z) _in g' (z) 
f (2) (z — a)"g(z) z—-a_ g(z)’ 


(g'/g is analytic at a). Thus sty fg i = n; more generally it equals the difference 


between the number of zeros and poles (counted with their order) inside the curve 
of integration. 


. Rouché’s theorem: If p, — f inside aclosed simple curve y, with f non-zero on 


y, then f and p, have the same number of zeros inside y, from some n onwards. 
Proof As |f | has a non-zero minimum on y, there is ann such that |> -I|<1l 


on y. Let F := p,/f then i = = $F oy Ldz = 0, since F o y is aclosed curve 
that excludes 0. By the previous example, this implies that / has the same number 
of zeros as poles, that is, the zeros of p, and of f are the same in number. 


Exercises 12.21 


1. 


Show that, along any closed curve 1 in C, [4 1dz = 0 and /4zdz = 0, but on 
a unit circle centered at the origin, f. Re(z) dz = zi. 


. If fn(z) > f(z) in X for all z on a simple closed curve w, on which f, and f 


are continuous, then = In(z) dz > te f(z) dz. 


. Assuming u and v are sufficiently differentiable, deduce from the Cauchy- 


Riemann equations that the real and imaginary parts of an analytic function 
f =u -+iv are harmonic, 
a7u 7 au au 


i oe , ji 32 
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. Let f : C > C be analytic. Suppose | f| is constant in some open set, then f is 


constant. (Hint: Differentiate | f ? =u? + v7.) 


. Find the poles and residues of (a) e!@/(z? + 1), (b) = ; ea Ae (c) (sin z)/z? (First 


show (sin z)/z is analytic at 0). 


1 2 1 
: : ¢ an dz = along a simple closed counter-clockwise path that 
2mi J z(z-—1 2 


includes 0, 1, but note —1. 


. Show 
(a) fo” spdgg dO = 207/V3 using f@) = 1/(22 +42 + 1), 20) =e”? 
(b) J love) a dx = —_ 7 using {[@i= al oat: 
cme hed Leesa dx = % using f(z) := (1 — e!)/2z?. 
. By applying Example 3 to f = e%, prove that the order of any of its poles must 


be zero. As this is impossible, the isolated singularities of f must be essential 
singularities. 


. Use Rouché’s theorem to show that cosh z — 2 cos z has 2 zeros in the unit disk, 


assuming it equals its MacLaurin series. 


Remarks 12.22 


I. 


The first use of the Newton-Raphson method was by the “Babylonians” who used 
it to find square roots, x* = n. Newton’s method was initially restricted to finding 
roots of polynomials, and it was Simpson (1740) who described the iteration we 
use today. 


. Cauchy’s theorem for analytic functions is a special case of Green’s or Stoke’s 


theorem $ F - dr = [{ Vx F.- dA. In this case, using the Cauchy-Riemann 
equations, 


$ foa= pu + iv)(dx + idy) = fp uax uvdy 4 ip vdx +udy 
=H] Geta) Ol Ga) 
ax dy dy 
=0 


Part III 
Banach Algebras 


Chapter 13 
Banach Algebras 


13.1 Introduction 


We now turn our attention to the space of operators B(X). We have seen that it is a 
Banach space when X is one (Theorem 8.7), but additionally, one can compose, or 
multiply, operators in B(X). This extra structure turns the vector space B(X) into 
what is called an algebra. We shall mostly study these spaces as abstract algebras 
X without specific reference to them being spaces of operators, in order to include 
other examples of algebras and to make some of the proofs clearer. Nonetheless, 
B(X) remains our primary interest, and accordingly, the elements of an algebra will 
be denoted in general by upper-case letters T, S,... to remind us of operators and 
to distinguish them from mere vectors x. 


Definition 13.1 


A unital Banach algebra ¥ is a Banach space over C that has an associative 
multiplication of vectors with unity 1, such that for all R, S,T ¢ V¥,A EC, 


(R+S)T =RT+ST, RS+T)=RS+ RT, 
QAS)f =XCST) = SAT), 
IST < SHIT, All = 1. 


Throughout this book, a Banach algebra will mean a unital Banach algebra. Of 
course, Banach algebras over R are also of interest, and all the results in this chapter 
apply to them in modified form; but complex scalars are necessary for an adequate 
spectral theory of ¥. 
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Easy Consequences 


1. 1 is unique, because 1’ = 1'1 = 1 for any other unity 1’. 


2. T is said to be invertible (or regular) when there is an element S, called its inverse, 
such that ST = | = TS. The inverse of T is unique when it exists, and is denoted 
T—!.If AT =1=T7B then A = A(TB) = (AT)B = BsoA=T~!. 


3. (S+T)* = S*4+ ST +TS+T?, and more generally, 
(S+ Ty = gi 4 (s7aol ie TST"~2 Ae oteeale eam A. orga T". 
4. 7") < TI". 


Proposition 13.2 


Multiplication, (7, S) + TS, is a differentiable map. 


Proof In the identity 
(T+ H)(S+ K)=TS+(TK+HS)+HK, (13.1) 


the map (H, K) +» TK + HS, Bane XX, is linear and continuous, and H K is of 
lower order, since 


TK + AS|| < TNA + PSA S max (TI, | SIDCAI + KD 
WHK Il < AUNKI < (Al + KID? = ICH, KIP. 


Of course, every differentiable map is continuous. Oo 


Examples 13.3 
1. C% with the the 00-norm and the following pointwise multiplication and unity: 
aj by a,b; 1 
ae ad | 
an bn anby 1 
2. » £© with pointwise multiplication xy, and unity 1 = (1,1,...) (Exercise 
9.4(3)). 


3. C(K), the space of continuous functions on a compact set K, with pointwise 
multiplication fg(x) := f(x)g(x), and unity the constant function |. For exam- 
ple, C[O, 1] is a space of paths in the complex plane. 


4. » ¢! with the convolution product; unity is e9 = (1, 0,...) (Exercise 9.7(2)). 
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10. 


11. 


. The space L!(R) with convolution as a product; although it does not have a unity, 


we can artificially add a4, called Dirac’s “function”, such that d* f := f =: fxd. 
(To make this rigorous, one needs to consider L'(R) x C with elements ( f, a) 
representing f + ad.) 


The above examples happen to be commutative, i.e., ST = TS holds. But this 
is not assumed in general. For example, T? — S* 4 (T — S)(T + S) in general. 


. » B(X) for any Banach space X; the product is operator composition (Proposi- 


tion 8.8). 


. » If X and ¥ are Banach algebras, then so is Y x Y with 


St\ (S2\ _ ( SiS2 _ (lx S\y 
Cet: a(t I (j) [= maxcistia IT Iy) 


. Every normed algebra can be completed to a Banach algebra. 


Proof Using the notation of Proposition 7.17, if T = [T,] and S = [Sy], let 
ST := [S,T,] and 1 := [1]. Note that S, 7, is a Cauchy sequence by 


Sn Th _ Sin Tn || Sn Th =. Sn Tin || + Sn Tn _ Sm Tn I 
Sn IZ — Till + Sn = Sin ll ll Zin ll 


< 
< 
<c(WSn — Sm|l + Tn — Tm ID- 


Hence 


R(ST) = [Rn(SnTn)] = [Ria Sn) Th] = (RS)T, 
MST) = [A(SnT,)] = AS)T = SAT), 
STI = lim |SoTull < lim. | SullTall = USI. 


. The polynomials C[z] on Bc with the oo-norm form an incomplete algebra. As 


we shall see shortly, its completion is the space of analytic functions C°(Bc). 
More general is the tensor algebra, consisting of polynomials and series in N 
non-commuting variables. 


> If ST = 0 and S is invertible, then T = 0. But there may exist non-zero 
non-invertible elements S, 7, called divisors of zero, for which ST = 0. Note 
that 7S need not also be 0, so S and T are more precisely called left and right 
divisors of zero, respectively. 


> The product of invertible elements is invertible, with (S Ty =f "3s, 
Also, (T71)7! = T.If T” is invertible, for some n > 1, then so is T. 


But it is possible for two non-invertible elements to have an invertible product, 
i.e., ST invertible 4 T invertible (unless T R is also invertible for some R). In 
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12. 


13. 


particular, ST = 1 by itself is not enough to ensure T and S are invertible. For 
example, in Be!), the product of the (non-invertible) shift-operators is LR = [. 


Suppose an element satisfies some non-zero polynomial, p(T) = 0. The unique 
such polynomial of minimum degree and leading coefficient | is called its min- 
imal polynomial Py. It divides all other polynomials p such that p(T) = 0. 


Proof There cannot be two minimal polynomials, p,, and p, otherwise p,, — p 
has a lesser degree than both and p»,(T) — p(T) = 0. If p(T) = 0, then 
P = Pm +1 by the division algorithm of polynomials. As r has a strictly 
smaller degree than pm, yet r(T) = p(T) — q(T) pm(T) = 0, it must be the 
zero polynomial. 


The derivative of the map T +» ST is S. Similarly the derivative of T +> T” is 
Hr AT" '4+7HT" 7? 4+...47° "0, 


Because of commutativity, this simplifies to (z”)/ = nz”~! in C. Thus, any 
polynomial in T is differentiable in T. 


Subalgebras and Ideals 


Definition 13.4 


A subalgebra of an algebra 1 is a subset which is itself an algebra with 
the same (induced) addition, scalar multiplication, product, and unity. It is a 
Banach subalgebra (with the induced norm) when it is also complete. 

An ideal is a linear subspace Z such that ST, 7S € Z for any T € 4X, 
Seg. 


To show that a non-empty subset A is a subalgebra of 1, one need only show 


closure of the various operations, i.e., for any $,T ¢€ A,S+T € A, AT € A, 
ST € A, 1 € A. The required properties of the induced operations are obviously 
inherited from those of +. 


Examples 13.5 


1. 


2: 


C is embedded in every (complex) Banach algebra as Cl = {zl : z € C}. In 
fact, it is customary to write z when we mean zl. 


An element T generates the subalgebra of polynomials 


C[T] := {ag t+ayT +--+ +a,T" :a,,...,4, € Cn EN}. 
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More generally, a finite number of commuting elements 7),..., 7, gener- 
ate the commutative algebra C[T|,..., T,], which may contain, for example, 
1-2%)+T7/N. 


3. The algebra €~ contains the closed ideal co. 


Proof That co is a closed linear subspace of £° is proved in Proposition 9.2. Let 
(an) € €°, (bn) € co, then (ay)(bn) € co since 


| lim apb,| < sup |an| lim |b, | = 0. 
n—->oo n n—->oo 


We will see later that every commutative Banach algebra, except C, has non-trivial 
ideals (Example 14.5(4)). 


4. The center X':={T:ST =TS, VS € X } is acommutative closed subalgebra 
of X. 


Proof If T, T. € X’, then 


S(T, + AT2) = ST; + AST, = 7,8 + ATS = (Ty + AT2)S, 
S(T, To) = T)STo = (TT2)S, ST=S=TS. 


The algebra is commutative by definition of 1’. 


5. » Proper ideals do not contain 1, or any other invertible element 7, otherwise 
it would have to contain every element S = S$ TOT. (However, as remarked in 
Example 13.3(11), the set of non-invertible elements need not be an ideal, or even 
a subspace.) 


6. Aclosed ideal gives rise to a quotient algebra V/Z with multiplication and unity 
defined by 
(S+Z)\(T+7Z):= ST +7, 1+7. 


7. A maximal ideal is a proper ideal Z for which the only other ideal containing it 
is XV itself, 
TCOIJ CHK SB IFJFH=TORI=AN. 


Maximal ideals are necessarily closed, assuming that the closure of a proper ideal 
is also a proper ideal (Example 13.22(3)). 


8. * Every proper ideal is contained in a maximal ideal. 


Proof Let C be the collection of all proper ideals that contain the proper ideal 
TZ. By Hausdorff’s maximality principle, C contains a maximal chain of nested 
ideals Z,. Then M := Ua Ty is an ideal, since if T € Zy and S € Tg C Ty, say, 
then S+ T € Zy C M, and for any S € ¥, both ST and TS are in Zy C M. 
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It is obvious that M is proper and contains Z since | ¢ Zy > TZ for every a, and 
that M is maximal since the chain Zy is maximal. 


Morphisms 


Definition 13.6 


A morphism ® : * — YJ of Banach algebras is a continuous linear map 
(preserving limits, addition, and scaling) which preserves multiplication and 
the unity, 

®(ST) = O(S)P(T), P(x) = ly. 


A character is a Banach algebra morphism ¢ : ¥ — C. The set of char- 
acters, denoted by A, is called the character space, or spectrum, of X. 


Examples 13.7 


1. Invertible elements of V are mapped by algebra morphisms to invertible elements 


of Y, 
a(Ty t= ocr, 


since ®(T)®(T~!) = ®(TT~!) = (1) = Landsimilarly, ®(7~!)®(T) = 1. 


2. » The kernel of a Banach algebra morphism, ker ® := {7 : ®(T) = O}, isa 
closed ideal. It is maximal when ® € A. 


Proof If ®(T) = 0, then ®(ST) = ®(S)®(T) = 0; similarly, 6(7 S) = 0. 
Maximality: Let & : ¥ — C bea morphism, and let the ideal Z contain ker ® 
as well as some T ¢ ker ®. Then ®(7) = A € O, and ®(A — T) = 0; so 
A= (A-—T)+T € 7, andZ must equal ¥ (Example 13.5(5) above). 


(Every maximal ideal of a commutative Banach algebra is of the type ker @ with 
od € A, but the proof requires Exercise 13.10(19) and Example 14.5(4); see the 
proof of Theorem 14.38.) 


3. An isomorphism of Banach algebras is defined to be an invertible morphism 
® : X — Ysuch that ©! is also a morphism. In fact, an invertible morphism is 
automatically an isomorphism. 


4. An automorphism of a Banach algebra V is an isomorphism from %* to itself. For 
example, the inner automorphisms T +> S~'TS, for any fixed invertible S. 


5. Since C is commutative, commutators [S, T] := ST — TS are mapped to 0 by 
characters (if they exist). 
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Representation in B(X) 


Some mathematical theories contain a set of theorems stating that any abstract model 
of the theory can be represented concretely. For example, every group can be repre- 
sented by a permutation group, and every smooth manifold is embedded as a smooth 
“surface” of a Euclidean space. In this regard, every finite-dimensional Banach alge- 
bra can be embedded, or “faithfully represented”, as a matrix algebra, and more 
generally, we have the following representation theorem: 


Theorem 13.8 


Every Banach algebra can be embedded as a closed subalgebra of B(X), 
for some Banach space X. 


Proof The Banach space X is the Banach algebra ¥ itself without the product 
(although there may well be ‘smaller’ Banach spaces that fit the job). That is, the 
theorem claims that V is embedded in B(%). To avoid confusion, we temporarily 
denote elements of V by lower-case letters, and the operators on them by upper-case 
letters. 

Let La (x) := ax be left-multiplication by a. Then Lg € B(#) since multiplication 
is distributive and continuous: 


La(x + y) = a(x + y) = ax + ay = La(x) + Lay), 
Lqg(Ax) = a(Ax) = X(ax) = Lg (x), 
|La(x)|] = llax|] < llallllall, 


so that || Lq|| < ||a||. Furthermore, 


La+p(x) = (a+ b)x =ax+bx =La(x)+ Lyx), LiQx) = 1x =x=1 (x), 
Lia(x) = (Aa)x = AL g(x), LaQ) =al=a, 
Lap(x) = (ab)x = a(bx) = LaLp(x), 


so |lal| = [|Zalll < |Lalll = |!Lal] and ||Lal| = |la||. These show that the 
mapping L : X — B(%) defined by L : a + Lg is an isometric morphism of 
Banach algebras. In fact, the space of such operators, im L, is a closed subalgebra of 
B(&X) since isometries preserve completeness (Exercise 4.17(5)). Note that all the 
Banach algebra axioms have been used. Oo 


As one may anticipate, B(X) and B(Y) are not isomorphic as Banach algebras, 
when X and Y are not isomorphic as Banach spaces. The proof, however, is not as 
obvious as one might expect. 
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Theorem 13.9 


Let X and Y be Banach spaces. A Banach algebra isomorphism J : 
B(X) — B(Y) induces a Banach space isomorphism L : X — Y, such 
that 

A i) aie OY Ob ma 


Thus, every automorphism of B(X) is inner. 


Proof The idea is to establish a 1—1 correspondence between vectors x € X and 
certain projection-like operators P, ¢ B(X), and similarly y <> Ry for Y; using the 
given mapping J : T +> T, the sought isomorphism would then be 


Lixth Phys Rye y. 


The correspondence x <> P,: For the remainder of the proof, fix a vector a € X, 
a # 0, and a functional @ € X* such that ga = 1. Multiplying x by @ gives an 
operator P, := x@, that is, Pu := (gu)x; conversely, multiplying P, with a gives 
back the vector P,a = xga = x. The crucial characteristic of these operators is, for 
any T € B(X), 


TPy = Tx = (Tx) = Prx, Py ty = (x1 +22) = Py, oF Py. 


In particular P,P, = xdad = P,. Note that || Py|| = ||lx@|| < [|x| ||| and 
|x || = || Peall < || Pella]. Thus, P : X > B(X), x b> P, is an embedding. 

The isomorphism J maps P, € B(X) to a similar operator Ry € B(Y): The 
relation pe = P, is preserved by J, so P, := J(P,) is a non-zero projection in 
B(Y). Pick b € im P, and y € Y* such that wb = 1 and y ker P, = 0 (Proposition 
11.18), and define Ry := yy. Ry satisfies analogous properties as P,, such as 
Ryb = y and TRy = R7,. Now suppose c € im P,, and let T € B(X) correspond 
to Re € B(Y) under J; then J transforms the identity 


PaT Py = a(oTa)oé =APy, where 4= $Ta, 
to P, Re P, — Pa, so im P, = [[b]] since 
c= P,c = P,R,-b = P,R,Pab = \Pyb = db. 
Thus the projections P,, and Rp have the same image and the same kernel, and we 


can conclude that they are equal to each other. 
Hence, the identity P; = P,P, becomes, in B(Y), 


P, = P,R, = Rp, = Ry, where y= P,b. 
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The map L : x +» y = J(P,)b is an isomorphism: That L is linear, continuous, 
and 1-1 follow from: 


L(x, + X2) = J (Prytn)b = J (Px, + Px )b = L(x1) + L(x2), 
L(Ax) = J(Pax)b = J(AP,)b = AL(x), 
Ll] = FCPS < WMG Heol, 
Lx =0 6 J(P,)b=0 6 J(P,) =0 6 PR, =0 6 x=0. 


Given any y € Y, J~! maps the identity Ry = RyRp to S = SPg = Pq. So for 
x i= Sa, 
Lx = J(Psq)b = Ryb=y, 


and L is onto. By the open mapping theorem (Theorem 11.1), L is an isomorphism. 
T=LTL": YJ maps the identity T Py = Pry toT Ry) = Rix). Multiplying 
by b to get the vector form, this reads TLx = LTx forall x € X. 
When X = Y, then L € B(X), and J is an inner automorphism. oO 
Exercises 13.10 
1. Banach algebras of square matrices abound: the sets of matrices of type (; a) 


Oa ba —b 
and are Banach subalgebras of B (C?). 


2. C:=C? with (5) a = ia ae Y is a Banach algebra, with unity 


. (Hint: it is a matrix algebra in disguise.) 


f: ? ) ; ( P ) , Or ( a : ) are each closed under addition and multiplication, 


0 
3. Find examples of 2 x 2 matrix divisors of zero, ST =O ATS. 


4. Show that in an N-dimensional algebra, every element has a minimal polynomial 
of degree at most N; e.g. every square matrix A has a minimal polynomial. 
Show also how the Gram-Schmidt process (with respect to the inner product of 
Example 10.2(2)) can be applied to the sequence J, A, A”, ... to construct this 
minimal polynomial. 


5. An idempotent satisfies P? = P. They are the projections in B(X); what are 
they in C’ and £°? The idempotents of C[0, 1] are trivial. Show further that 
PX P is an algebra with unity P, called a “reduced algebra”. 


6. A nilpotent satisfies Q” = O for some n, e.g. (5 > and (; Ay In CY, 


£°, and C[0, 1], there are no nilpotents except zero. Find all the 2 x 2 matrix 
nilpotents of order 2, i.e., O° = 0. 
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10. 


11. 
12. 
13; 
14. 


15. 


. Anelement is cyclic when T” = | for some n, e.g. ( 
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10 
Oi 
are sequences whose terms are of the type e?7!”"/” for a fixed n. 


) In CX and © they 


. The product of differentiable functions is again differentiable, with 


(£9) (T)H =f (T)Alg(T) + f(T)lg'(T) A). 


This can be written in short as the familiar product rule (fg)' = f’g + fa’, 
provided it is remembered that the vector H is acted upon by each derivative. 


. If F: R= & is integrable and T € &, then [ F(t)T dt = ({ F)T (First show 


it true for simple functions). 


* Group Algebra: Let G be a finite group of order N, and {ey : g € G} be 
an orthonormal basis for C%’; define €g * €p = gh, and extend the product to 
all other vectors by distributivity. The result is a Banach algebra CS (or £'(G)) 
with unity e; and the 1-norm. Every basis element is cyclic. 


For example, the cyclic group { 1, g : g? = 1}, gives rise to an algebra generated 


by ey := ({) and @g := (a) and the product 


” c\ _ fact+bd 
)+(s) en emaein earn (44). 


The closure of a subalgebra is an algebra (use continuity of the product). 
If Z and 7 are ideals, then so are Z + J and Z. 
The center of B(X) is C. (Hint: Consider projections x@, for any x € X,@ € X*.) 


> The centralizer or commutant of a subset A C X, 
A’ :={T:AT=TA, VA€E A} 


is a closed subalgebra of 1. (In fact, when ¥ = B(H), A’ is weakly closed by 
Exercise 11.42(11a).) 

Prove: 

(Zy9 ACB => BCA, 

(b) AC A” and A” = A’, 

(c) If T € A’ is invertible in ¥ then T~! € A’, 

(d) If elements of A commute, then A C A’ and A” is a commutative Banach 


algebra. 


A left-ideal is a linear subspace of ¥ such that TZ C TZ for any T € &. 
Similarly, for a right-ideal, TT C TZ. For example, 1S is a left-ideal, and SV is 
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16. 


17: 


18. 
19. 


20. 


21. 


22. 


23: 


24. 


2). 


a right-ideal, but 4S need not be an ideal. Instead, the ideal generated by S 
is VSA]. 


Let A be a closed subset of [0, 1], then 
Ta :={f €C[0, 1]: Vx e A, f(x) =0} 
is a closed ideal of C[0, 1]. Conversely, given a closed ideal Z of C[0, 1], let 
A:={xe€[0,1]):Vf eZ, f(x) =0}, 


then Z = Z,. What are the maximal ideals? 


Let Z,4 be aclosed ideal of C[0, 1], where A is a closed subset. Then the mapping 
f +Zat fla is an isomorphism C[0, 1]/Z4 = C(A). 


An algebra morphism ® : X — J ‘pulls’ ideals Z in ) to ideals ~!Z in X. 


If Z is a closed ideal, then ®(7T) := T + TZ gives a Banach algebra morphism 
®:X > X/T with kernel ker 6 = 7. 


The mapping >°,,anz" +t» (ay) from the set of power series converging 
absolutely on the closed unit disk D of C, considered as a subspace of C(D), to 
¢! is a 1-1 Banach algebra morphism. 


Leto beapermutation of 1, ..., N; then the mapping defined by (z1, ..., zx) t= 
(Zo(1), +++» Zo(N)) iS an automorphism of Cy. 


For the group algebra C°, let o be an automorphism of the group G; then 
€, +> @o(g) induces an automorphism on C%, 


The algebra C is embedded in B(C) as diagonal matrices. C is represented 
by the matrices é - : 5) The group algebra C® is generated by the Cayley 
matrices of G. 

Show that every Banach algebra of dimension 2 (over C) can be represented by 


the matrices generated from J and ), where q@ is a fixed number and # is 


Oa 
1B 
0 or 1. What are « and f for the group algebra generated by { 1, g : g? = 1}? 
Let XY be a Banach algebra contained in B(X). Its unity P = P? is a projection, 
so X = M @N where M = imP. Forevery T €¢ ¥, PT = T = TP implies 
M is T-invariant and TN = 0, hence & acts on M. 
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13.2 Power Series 


Definition 13.11 


A power series is a series >, dn I” where a, € C and T € ¥. 


Recall that the root test can help determine whether such a series converges or 
not: if ja,T?|/" = |ag|/"y 77 ||!/" converges to a number less than 1, then the 
power series converges. It is important to know that ||7”|| i converges: 


Proposition 13.12 


For any T ina Banach algebra, the sequence || 7” || Ue converges to a num- 
ber denoted by o(T), where 


wn eEN, p(T) <(7"||/" < ITI. 


Proof It is clear that 0 < ||7"||'/" < ||T||. Let p(T) be the infimum value of 
7" ||!/", meaning that ||7”||!/” is bounded below by p(T) and 


1/N 


Ve >0,4N, p(T) <|TX < p(T) +e. 


Although the sequence ||T”|| '/” is not necessarily decreasing towards p(T), notice 
that || 79" || 1/9" < 7" ||1/"". For any n, let n = gnN +rp withO < rp < N (by the 
remainder theorem), then 0 < r,/n < N/n — O and q,/n = vl = “n) > v as 
n — o, so that 


1/ in / + 
oT) < Ty = Te Te Tl" > TAY < p(T) +e. 


Since € is arbitrarily small, this shows that ||T”|| Mt p(T) from above. | 


Examples 13.13 


1. & @) p(1) = 1,6) PAT) = [Alp(T), (© p(ST) = p(TS),@) pI") = p(T)", 
since 
wuyim a, para = pages, 


a a lic 1 
p(ST) < |(ST)"l* < [Sl (Psy ITI" > (TS), 
Wry yl” = | || am —> p(T)" asm > ov. 


But e(T) may be 0 without T = 0; and p(S + T) € p(S) + p(T) in general 


01 00 : 
(cx. G 0) , ¢ ay) So ¢ is not usually anorm on “&. 
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2. & p(T) = ||T|| > I7"| = ITI", Va EN, since ||| = p(T) < T"'" < 
| |). 

3. » If p(T) < 1, then T” — O (even though ||T|| may be bigger than 1). If 
p(T) > 1, then T” +> on. 


Proof For € small enough and n large enough, 


IT" <pT)+e<1 => IIT" 


< p(T) +6)" + 0, asn > ~, 
In" >pT)>1t— > IT" 


l<¢ 
|>(+6)" > &. 


Theorem 13.14 Cauchy-Hadamard 


The power series oe a,T", wherea, € C,T € X, 


e converges absolutely when p(7) < R, and 
e diverges when p(T) > R, 

where R := 1/limsup|a,|!/” 
series. 


is called the radius of convergence of the 


Proof This is a simple application of the root test. The nth root of the general term 
satisfies 
lim sup |la, 7" = lim sup |a,|!/"p(T) = p(T)/R. 
n n 
Thus, if o(7) < R, then the series converges absolutely, while if o(T) > R, 


then it diverges. Assuming -V is complete, the power series converges or diverges 
accordingly. Oo 


Examples 13.15 


1. Ratio test: If |ay|/|an41| — R then so does |a,|~!/" (Section 7.5), hence R 
would be the radius of convergence of >, a, T”. 


2. Some aspects of power series may seem mysterious from the point of view of 
real numbers: The series | — x* + x4 — x® +.--- has a radius of convergence 
of 1 and converges to im which takes a finite value at all x € R (but not at 


x = i). Moreover the same series can also be written as (5 — (4 — x?))7! = 


2 : : ; i 
; pats (= We yet in this form it converges in the larger range —3 < x < 3. 


3. The theorem also applies to power series >°,, Anz”, where A, is a sequence of 
elements in V. The radius of convergence is then 1/ lim sup, || An|l Ye, 
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4. When |a,| < c for all n, then anT* +a3T? +--+ = 0(T) for small T, since it is 
bounded above by eT |? / — ||T|). 


When can a function be written as a power series? We wish to establish that being 
analytic in a neighborhood of 0 is a necessary and sufficient condition. The necessary 
part is the content of the following proposition, but sufficiency will be shown later 
(Theorem 13.26). 


Proposition 13.16 


A power series f(z) := (729 anz” is analytic strictly within its radius of 


convergence R, and 


fi @Q= > aaa 
j=l 


Proof First of all, the power series >", Ganz" 


convergence R as >), dnz”, 


converges, with the same radius of 
lim sup |na,|!/" = lim n'/" lim sup |ay|!/" =1/R (Exercise 3.5(1d)). 
n n—>0oOo n 


For each individual term of the given power series, 
(c+ hy" = 2" + n2""'h + on(h). 


It needs to be shown that | >", dnon(h)|/|h| > 0 as h — 0. One trick is to find an 
alternative way of expanding (z + h)” as follows: 


(eth) = (zt+h)" Tht (thy z 
= (2th) At (th) 2h + (2 + hy) 222 


= (thy tate + zthy te tng then 
=> — ((zthy— 2" < ((ztayr tte. + [z(t hIA 
<nr""'|hl, (13.2) 


where r is larger than |z| + |/| but smaller than R. Now, 


On(h) = (< +h)" — 2" —nz""h 
=(cthytht---+(cthy hse... teh 
zp Se Zita k k-1p eee zi-lp 


n 
= » (c+ hy? * = 2" *)\2 th 
k=1 


so |on(h)| < (a — Ar" 7|AP> +--+ +r"? |hP by (13.2) 
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-1 
_ n(n = V) 2p 2 
2 
i n(n — 1) 
But the series c := > ldn| ge converges for r < R, so 
2 
n=2 
CO [o.@) 
| 3 anon (h)| < So lanllon(h)| < elh|? 
n=0 n=2 
which proves that the remainder term >°,, dn0n(h) is o(h). oO 


There are two important consequences: Since differentiating a power series gives 
another power series with the same radius of convergence, then we can differentiate 
repeatedly. Secondly, we know that polynomials are distinct as functions on C when 
they have different coefficients; this property remains valid for power series: If a 
function can be written as a power series, then its coefficients are unique to it. 


Proposition 13.17 


Assuming a strictly positive radius of convergence, 


(i) a power series f(z) := yy anz" is infinitely many times differen- 
tiable, and 
iO) 


n! 


an 


(ii) distinct power series do not have identical coefficients. 


By distinct power series is meant >°,, baT” 4 >_,, CnT” for at least one T. 


Proof (i) By induction on n, f has the power series 


(n+ 2)! 


f™() =nlan + (n+ Dlangizt a “an p2" bo 


Substituting z = 0 gives the stated formula. 


(ii) Suppose >°,, bn T” = >¢,, nT” for all T such that p(T) < R, the smaller of 
their radii of convergence. By taking the difference of the two series, it is enough to 
show that if f(z) := >°,, dnz” = 0 for all z € Br(O), then a, = 0 for all n. But this 
is immediate from (i) since f (0) = 0 in this case. | 
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There are a couple of power series of supreme importance. As motivation, consider 
the possibility of converting addition in a Banach algebra to multiplication, 


fa+ry=f@FO), FO) =1. 


Apart from the constant function f = 1, are there any others? If f exists, it would 
have to satisfy a number of properties: 


(a) f(x) = f(x)", f(-x) = f@) 

(b) When the algebra is R, f(m/n) = a’"/" where a := f (1) > 0 (Hint: f(n/n) = 
fCU/n)"), 

(c) f is uniformly continuous on Q N/M [0, 1], so it can be extended to a continuous 
function on R, usually denoted by f(x) = a*, 

(d) f(x) = f’(0) f(x) if f is differentiable at 0, since f(h) = 1+ f’(O)h + o(h) 
so 


fa +h) = fO)A) = f(x) + FO) fA + o(h); 


consequently / is infinitely many times differentiable with f(x) = f’(0)"f (x). 
Taking the simplest case f’(0) = 1 (so f (0) = 1) leads to the following def- 
inition: 

The exponential function is defined by 


T? 
Zee eae “dF T 


Its radius of convergence is lim inf, lay |—1/" = = limy-so0 cee = oO by the 


ratio test, so e” exists for any T and satisfies Je” || < e!!7!. 


Similarly, starting with f(xy) = f(x) + f(), we are led to the logarithm 
function, defined by 


To 7? S (-1)"*! 
log + T) := T-—- —+—++--:-= jae 
og(1 + T) bee EU pe 


n=1 


with radius of convergence lim inf, |an|~!/" = limps oo Wey = 1. 


Proposition 13.18 


When S, 7 commute, e°+? = eSe’. For p(T) <1, e+ =14T. 
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Proof (i) The product ee’ can be obtained in table form as, 


Pa tb ge poge 


el 

1 1 os ae 
a ‘4 a 4 
= f° er oer 

+ a 
172 1 2 1 2 1 O22 


/ 
7 


The general term in this array is a as" T= a} S"TN—" where N :=n+m 


is the Nth diagonal from the top left corner. This is precisely the nth term of the 
expansion of mS +T)% when S$ and T commute, so the array sum is eS+7. 


(11) The second part can be (tediously) proved by making a power series expansion 
as above (Exercise 13.19(8)). We defer the proof until we have better tools available 
(Example 13.30(3)). oO 


Exercises 13.19 


1. Calculate o(T) for the following matrices 


O01 la a0 al 
@ (5) (95)-©(53)-@(54)- 


Only one of these examples satisfies p(T) = ||T'||. 


2. Every idempotent P, except 0, satisfies p(P) = 1; every nilpotent Q has 
p(Q) = 0, and every cyclic element T has p(T) = 1. 


3. For any invertible S, 0(S~!TS) = p(T), yet ||S~!TS|| may be much larger than 


.) and S$ := € t )ethen 5-1 PS = € -) 


1 
||T ||. For example, let P := ({ 0 0 00 


has norm V1 + |ac|?. 


4. If ST = TS, then p(ST) < p(S)p(T). Deduce p(T~!)—! < p(T), and find 
examples of non-commuting matrices where p(ST) > p(S)e(T). 


5. TheequationT — AT B = Chasasolution T = > 4 A" CB" if p(A)p(B) <1. 


6. The radii of convergence of 


oe) 


yar Sint", Sr" /n, Sr" in! 


n=0 n=0 n=1 n=0 
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are 0, 1, 1, 00, respectively. A quick way of estimating the radius of convergence R 
is to judge how fast the coefficients grow: if corg < |dn| < cir] then + <REK< x 


How are the radii of convergence of >°,,(a@n + bn)T” and >~,, dnbnT” related to 
those of >°,, anT" and >°,, bnT"? 


Let f(T) := re oanT” and g(T) := > obnT". Find the power series 
expansions of f + g, fg and f o g. In particular, find the first few terms of 
“Uy Pay Fa 

Let f(T) := >0,, nT” be a power series, and F(T) := >°,, |an|T”; they have the 
same radius of convergence R. If ||T || < R, then || f(T)|| < FC ITI). 


The convergence of a power series is uniform in 7, for ||T || <r < R. 


. When T satisfies a polynomial p(T) = 0, then every (convergent) power series on 


T reduces to a polynomial in T. 
(a) e° = 1, (b) the inverse of e” is e~!, (c) e"? = (eT). 


By analogy with the complex case, define the hyperbolic and trigonometric 


0-1 coat 
functions of T as power series, and show (a) el 0 )x = Gi nal (b) 
sinx COSX 
11 cosx —x sinx T : 
cos (91) * = ( 0 Baas (c) e = cosh7T + sinhT, 
(d) e'? =cosT +isinT. 
Prove that there is a non-zero complex number @ such that e% = 1. Thus the 


T+na 


exponential function has a period, e = e!. The smallest such number is 


6.283 ...i =: 277. 


*(14+T/n)" > eT asn > ov. 
(Hint: each component in the series is + () TK > aT’, then use Exercise 9.7(1).) 


* The product of n terms, (1 + S/n)(1+ T/n)(1+S/n)---(1+T/n) > e5+7 
as n — oo. (At least show convergence for each power term.) 


* Trotter formula: e/"e7/"e5/"...e7/" 5 eS+T For example, 


OStT we eS/2 eT /2 95/2 7/2. 


Find the exact coefficients used in the Trotter-Suzuki approximation 


29-2938 20-107F 40.7078 ,0.293T 


that make it the best possible to second order. These formulas are very useful to 
approximate e°+" whenever S and T do not commute. 
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13.3 The Group of Invertible Elements 


Among the invertible elements of a Banach algebra, one finds all the exponentials 
e? (including all non-zero complex numbers) and all their products, as well as the 
unit ball around 1, as the next key theorem proves: 


Theorem 13.20 


If o(T) < 1 then | — T is invertible, (1 — 7)~! =1+7+77+--- 


Proof The radius of convergence of the series >°,, T” is 1, by Hadamard’s formula. 
For p(T) < 1, let Sy :=14+7+---+ TX + phar T”. Then, remembering that 
p(T) <1 = TN + Oas N > ow (Example 13.13(3)), 


Sy =1+T7+---+7% 
TSy = Tee +7 4 Nt 
=> (1—T)Sy =1 STNtl 2s 1, 


Similarly, Sy(1 — T) — 1 as N — oo. This shows that par T” is the inverse of 
1-T. Oo 


Theorem 13.21 


The invertible elements of a Banach algebra V form a group G(%) with 
the operation of multiplication. G(7) is an open set in V, and the map 
T +> T~| is differentiable on it. 


Proof Multiplication in a Banach algebra is associative and has a unity 1 € G(V). 
To prove G(%) is a group, it needs to be shown that if $,T € G(¥), then ST and 
T~! are invertible, a fact that is evident from 


OT) 27's. a) af. 
Let T be any invertible element of 1’, and consider any neighboring element 


T+H=T(14+T'A) 


with || HI] < 7~!\7". Then p(T7-!H) < ||T~! ||] HI] < 1, so that 1+ 7-!H, and 
by implication T + H, are invertible. As the neighboring points of T are invertible, 
T is an interior point of G(V’) and the group is open in ¥. 


In fact, writing T+ H = T+ TH), 


Temp ]=de7 at ert =r tae er ar ro 
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This shows that T +> T7~! is differentiable with derivative H +> —T~!AHT7}, 
by verifying 


om = ee ae 
TAT" <TR 


oo Qnr—1y3 
= z = 4 n43 AIT Ih 

(TAT AT! 4+... < Spay oy = ———_—___ = o (Fi). 
= 1—|7-'||4l 


oO 
A group, for which the acts of multiplication and taking the inverse are differen- 


tiable, is called a ‘Lie group’, a topic that has a vast literature devoted to it. 
In particular note that for H = zl, 


(2) ST Sar er see, (13.3) 


and that the map z > T — z+ (T — z)~! is analytic wherever the inverse exists; 
its derivative is (T — z)~?. 


Examples 13.22 


1. The group of N x N invertible complex matrices is often denoted GL(N, C). 
It has a group-morphism, the determinant det : GL(N, C) > C* = G(C), 


det AB = det Adet B 


whose kernel is the normal subgroup SL(N,C) of ‘special matrices’ with 


determinant |. 
2. In C, when z is large, z~! is small. But for general Banach algebras there is 
no such relation between || 7~!|| and ||7'||, e.g.the inverse of (10, 0.01) € C? is 


(0.1, 100). 


3. The set of non-invertible elements is closed in 1’. So the closure of a proper ideal 
is a proper ideal. 


Proof By Example 13.5(5), Z C G(X)°, soZ € G(X)° and T does not contain 1. 
4. IfT is invertible, then B.(T'S) € TB, 7-1) (S). Consequently, multiplication by 

T is an open mapping. 

Proof Let ||A — TS|| < €; then ||T~'A — S|| < ||7~'|||A—TS|| < ||T7'lle, 

as required. If U is an open set in ¥ and S € U, then S € B.(S) C U, so 


and TU is open in %. 


5. The set of non-invertible elements is path-connected (to the origin, say), and may 
disconnect the group of invertible elements, e.g. GL(2, R) disconnects into the 
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two open sets of matrices whose determinants are strictly positive and strictly 
negative, respectively. 


The following proposition confirms that as an invertible operator R approaches 
the boundary of G(X), ||R~!|| grows to infinity, as expected. 


Proposition 13.23 


Let T be on the boundary of the group of invertible elements. 


(i) For any invertible element R, ||R~!|| > 1/\|R — Tl, 


(ii) T is a topological divisor of zero, meaning there are unit elements S,, 
such that 


TS, > 0 AND S,T > 0, as n > oc. 


Proof (i) Since T is at the boundary of the open set of invertible elements, it cannot 
be invertible, whereas R and all elements in its surrounding ball of radius || R~!|| = 
are invertible, by the proof of the previous theorem. Thus ||R — T|| > ||R7! " as 
claimed. 


(ii) Let invertible elements R, converge to a boundary element 7, and let S, := 
R>'/||R7'||; then 


IT Srl = TR, W/UR, I 
= |\(T — R,)R,) + W/ RI 
< (IT — Rall + 1/ Ry 'Il 


< 2||T — Ry|| ~ 0 as n> =~, 
and similarly S,7 — 0 as well. oO 


As remarked earlier, the group G(%) need not be a connected set, but splits into 
connected components, with, say, G; being the component containing |. Recall that 
a component is maximal connected, so if G, contains part of a connected subset of 
G(X), it must contain all of it (Theorem 5.11). 


Proposition 13.24 


The component of invertible elements containing 1, is that open normal 
subgroup generated by e” for all T. 
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Proof G, is openinG(%): Any T € Gj, is an interior point of G(V),soT € Be(T) C 
G(X). But the ball B.(T) is (path-)connected and intersects G,, so Be(T) C Gt. 


G, is a subgroup of G(X): Multiplication by T is a continuous operation, so TG, 
is connected (Proposition 5.5). When T € G;, then T = T1 € TG; C G(X), so Gy 
contains part, and therefore all, of TG,. Hence T,S EG; => TS ETG, C Qi. 
Similarly, inversion is a continuous mapping, so G ‘/. is connected; it contains 1, so 
must be a subset of Gi, i.e, 7€G, = T7! eG). 


G, is anormal subgroup: By the same reasoning, for any invertible 7, T~!G,T 
is a connected subset of G(4’) and contains 1, so it is a subset of G, (in fact it must 
equal it). 


G, is generated by the exponentials: Let E be the group generated by the exponen- 
tials e? for all T € 2%; its elements are finite products e! ...e5, and their inverses 
are of the same type (e7 ---e°)~! = e~%.--e77. It contains 1 = e°, and is con- 
nected since there is a continuous path from | to every element e7 ---e*°, namely 
tr el! ...e!% fort € [0, 1]. We can conclude that € lies inside G). 

The elements near to | are all exponentials,! 1+ H =e!8U+) and so a small 
enough neighborhood around E := e? ---e* € E consists of elements 


E+H=E(Q+E7'A) =e! ..-eSe80t+E"D cg 


for ||H|| < e "SI... e lll, This means that e” - - - e° is an interior point of €, which 
is thus open. Its complement in G, is also open, since G; \E = U Teg ve TE (prove!) 
and each T€ is open (Example 13.22(4)). €, being open and closed in G;, must equal 
G, (Proposition 5.3). oO 


Exercises 13.25 


1. The invertible elements of C’ are (z;, ..., zy) such that none of the components 
are zero. 


2. In £%, a sequence (a,) is invertible if, and only if, it is bounded away from 0, 
1e.,0 <c < |d,|. Paths t w(t) in C[O, 1] are invertible when they do not 
pass through 0. 


In B(X), the invertible elements are the automorphisms of X. 
In B(X), ||T"] = 1/ inf yxy=1 ITIL. 
In X x Y, (S, T) is invertible if, and only if, both S and T are invertible. 


Oy 28 as ae 


The integral operator on C[a, b], T f(y) := c k(x, y) f(x) dx has norm satis- 
fying ||T|| < ||k||,.0|b —a|. Deduce that when ||k|| 0. < 1/|b—al, the equation 
Tf +9 = f has the unique solution f = °°) T"g. 


7. If T is invertible and Tx = y, (T + H)(x + xe) = y, then ie 2 {re 


' This was stated, not proved, in Proposition 13.18 but the argument is not circular. 
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8. The maptt> e!” isa differentiable group-morphism R — G(X); its derivative 
at t is Te!’. 


9. * Conversely, every differentiable group-morphism A : R > G(), meaning 
At+s = ArAs, is of this type: 


(a) dh > O,7 i, A; dt is invertible, by the mean value theorem (Proposition 
12.9), and f’*" A = (f' A)A;; 

(b) Let T := (Ay, — inva hy A)~!, so that Ajay = A; + hT A; + 0(h); 

(c) £(Are~"7) = (ZA, Je"? — A; Te? = 0, so Ay = Age’? = e'?. 


10. Verify Proposition 13.23 for e 1) =F ¢ a 


n 


11. A topological divisor of zero, also called a generalized divisor of zero, does not 
have right or left inverses. 


12. The right-shift operator R on £° is a right divisor of zero but not a topological 
divisor of zero. 


13. In finite dimensions, there is no distinction between divisors of zero and topo- 
logical ones. (Hint: S$, € Bz, which is compact.) 


14. Anisomorphism between Banach algebras preserves topological divisors of zero. 


15. If R is invertible, then Ro" || > 1/d(R, 0G(*)). (Hint: By the definition of 
d(u, 0G(¥)) (Example 2.20(9)), there is a sequence JT, € dG(A) such that 
[Zn — Rll > a(R, G(X)).) 


13.4 Analytic Functions 


There are two ways of connecting the coefficients of a power series to its function 
f@) =agtaztaz+--: : 
(i) by differentiation 
fP@D=nlant(nt+Dilangizt-- => f%O)=n!lay. 


(ii) by integration 


: e ie z . 
POT 2s cade a oe es ¢ Bacarra. 
z Zz z . 


These formulas raise the possibility of creating a power series from a given function, 
by defining the coefficients in these ways. The latter one is more useful because it 
does not assume f to be differentiable infinitely often. 
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Theorem 13.26 Taylor series 


If f: C = Cis analytic in a disk Br(0), then it is a power series inside 
the disk. For p(T) < R, 


—_ I = = = < n 
f(D: aa f OC eda Te 


n=0 


where 


an = 


(n) 
= feta: = Po YneN 


Qn nt” 
: 


(G 
and) Vr Kk, ac,, Yn eN; |en|< —- 


re 


Proof The path of integration is along a circle with center 0 and radius r just less 
than R (but larger than o(T)). For z on this circle, p(T /z) = p(T)/r < 1, so 


lee) 
Z—-T)'=z'A-T/z)' = De ae a and 
n=0 


1 —1 = < ah —l—n n_ = n 
aif FOE-7) = Diag f fox dz T => Gal : 


n=0 


However we need to justify the swap of the summation with the integral. Recall that 
zt (z—T)& (z—T)7! is continuous in z by (13.3), and the circle is a compact 
set, so || f(z)(z — 7)" || < C for z on the circle (Corollary 6.16). It follows that 


Lo peor 2! = PFO — TY Het < CDN Y/N! 0 
n=N 


uniformly in z. So s 4 f f@T" [2 dz > >, f(T" /z"*! dz. 
Note that 


1 
lan| < a pelmtar =c/r", 


where c is the maximum value of f on the compact disk B,[0] C C. The radius of 
convergence of this power series is at least R since 


lim inf |a,|~!/" > lim =r, We<R. 
n 


noo cl/n 
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To justify the use of the notation f(T), we need to show that when T is a complex 
number a1, the two uses of the symbol f agree, i.e., f(a1) = f(a) 1; but this is just 


Cauchy’s integral formula, f(a) = aon ¢ f(z)/(z—a) dz. Consequently an analytic 


function is indeed a power series. Oo 


Proposition 13.27 Liouville’s theorem 


If an analytic function on C grows polynomially | f (z)| < c|z|", then f is 
a polynomial of degree at most n. In particular, if f is bounded then it is 
constant. 


Proof If f: C — C were analytic on C, and grows polynomially, then its maximum 
value on a disk of radius r is c, < cr”. So the mth Taylor coefficient vanishes for 
m>n, 


lam| <cr/r™ < cr" " +0 as roo. 
This also applies to vector-valued analytic functions F: C — X. For any func- 


tional ¢ € X*, po F: C > Cis also analytic. If F grows polynomially, then so 
does do F 


Io F(Z) < IPINF I < W¢llelzl", 


which implies that ¢ 0 F(z) is a polynomial ag -++a,z+-+-+a,z". In fact, by Example 
12.3(3), dn = # o F™(0)/n!, so that 


$0 F(z) =¢0(F(0)+ F\(O)z +--+» + F™ O)z"/n!). 


As ¢ is arbitrary, we deduce that F(z) is a polynomial in z. Oo 


Theorem 13.28 Laurent series 


If f: C > C is analytic in a ring Br(0) \ B,[0], andr < p(T~!)7! < 
p(T) < R, then 


ans 2s _ —l a — n 
FD) = 5p FOG-T de = Daal", 


n>=—CO 


where a, = xy § f(z)z7!" dz, Vn € Z. The residue of f in B,[0] is a_. 


The path of integration is here understood to be just within the boundary of the 
ring, going counter-clockwise around a circle of radius just smaller than R, and 
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clockwise around a circle just larger than r. Note that R is allowed to be infinite, in 
which case substitute R with any value larger than p(T). 


Proof A Laurent series can be thought of as the sum of two separate power series, 
yo anT” + 2, a_nT—", one in T and the other in T—!. If R and R’ are 
the respective radii of convergence, then absolute convergence occurs only when 
p(T) < Rand p(T~!) < R’. 

For z on the bigger circle, o(T/z) = p(T)/|z| < 1 if the radius is close enough 
to R, so just like the proof of the Taylor series, 


1 CO 
—_ —T)'dz= c, 
aa $ foc yige > a: 
n=0 
For z on the smaller circle, o(zT~!) = |z|o(T~!) < 1 when its radius is close 


enough to r, so 
foe) 
@-Ty=-d-2 rt =- eer, 
n=0 
and (along an counter-clockwise path) 
1 oe ror) 
a —T)!d — _ n-lq / ee _ Toe. 
201 $ foc ) Z Lag f 1 Zz 2 n 


Combining the two integrals and series gives Laurent’s expansion. Note that the 
second series vanishes when f is analytic within B,(0), by Cauchy’s theorem, so it 
is consistent with Taylor’s theorem. 

Since the Laurent series converges uniformly strictly within the annulus, we obtain 


$f )d : S ¢ od 
— z)dz = —— nz" dz = a_}. 
210i 210i Poa 


Oo 


These two theorems of course also apply, by translating, to disks and rings with 
center zo; the resulting series will then be >), an(I’ — zo)”. 


Proposition 13.29 


The zeros of a non-zero analytic function, defined on an open connected 
subset of C, are isolated. 
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Proof Suppose an interior zero w of f : Q — C is a limit point of other zeros, 


Zn > w (Zn # w). Then f can be written as a power series f(z) = >, ax(z — w)* 
in some neighborhood of w. If ax is the first non-zero coefficient, then 


0= fn) = Gna - w)* (ax +ax+in—w)t+-:---), 


* O=ax +aK41(Zn — W) +++: aK aS ZI > wW. 


This contradiction determines that f is locally zero in &2. Hence it is zero in Q 
(Exercise 5.7(9)). oO 


Examples 13.30 


1. The Fourier series )°°° _. ane!”® is a Laurent series with T = e!°. 


2. » For polynomials (and circular paths as in the theorems), 


1 
p(T) = af p(2(g—T)7! 


For example, 


ma fe-D” Laz, T= 5b ue-1)" ie 
= ine 


_ 1 
es (z—T) dz. 


Proof for T~': We can use Laurent’s expansion on a path z(@) = re 
is analytic everywhere except at 0, 


on Omi J pete = On 0 pati © ~ 


unless n = —1, in which case a_; = 1. So ~~ a,T" = T7!, 


10 | since 1/z 


3. » We can finally show el8U+") = 14 T for p(T) < 1. 


Proof Let f(z) := e!8+2) for |z| < 1; then f’(z) = e!80+2/(1 + z) and 
f(z) = 0 (check!). So the non-zero coefficients of its Taylor series are ay = 
f (0) = e® = 1 anda, = f’(0) = 1. Hence f(T) =1+4+T. 


4. Binomial theorem: (1 +7)? := eP 80+T) = 1 + pT + (£)T? +--+ provided 
p(T) <1, p €C,and (?) := eae 
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Proof Define the analytic function f(z) := (1 +z)? = e? 8"+2) inside the unit 
disk Bc. Its derivatives are, by induction, 


£2) =H Taipan Le(P—"t D log +2) (4 che zy! 
= p(p—1)-* (pont) +2)", 


so its power series coefficients are a, = f)(0)/n! = (#). 


. > There are versions of these series expansions valid for a vector-valued func- 


tion F: C + X, where X is a Banach space and F is analytic inside a ring, 
r<|z|<R, 


1 
F(z) = sar f Pew — 2) 'dw = D7 Anz”, 


where An := sa f Pew dw eX. 
20i 

Proof For any ¢ € X*, the map ¢ o F: C = C, being the composition of dif- 

ferentiable functions, is analytic on the ring Br(0) \ B,[0], so it has a Laurent 

expansion ¢ o F(z) = so ¢ po F(w)(w—z)-! dw = >, bnz" forr < |z|< R 

and b, = @A,. But ¢ is linear and continuous, so it can be extracted out of the 

integrals and series, 


ooF(z)=¢ (—f F(w)(w -2'aw) = #2 Anz”, 


and as ¢ is arbitrary, the result follows. 


Exercises 13.31 


1, 


2ni 


Let T := to ae verify directly that T = d $ 2(z— T)~! dz by calculating 


the integral in a circular path around the origin. 


. Show that there are no analytic functions in C which grow at a fractional power 


rate |z|""/" (m/n € N). 


. Show that the Laurent series for cot 7, valid for p(T) < z, pir) > 0, is 


1 
cotT =T7! ‘4 T cee 
3 45 945 


and find its residue at 0. (Hint: cot z = (1—z?/2+z*/24+---)/z(l—z?/6+---).) 


. If an identity between analytic functions, f(z) = g(z), holds in a complex disk 


Br(O), then it holds for any T with p(T) < R. 
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5. Justify the identity nlog(1 + T) = log(1 + T)”, hence deduce the assertion 
lim (1+ T/n)" =e’. 
n—-> oo 


6. A function on C has a pole a of order N if, and only if, it has a Laurent series 
expansion yw ay(z — a)" about a; its residue is a_. 


7. * Two analytic functions on an open connected subset of C must be identically 
equal if they are equal on an interior disk. (Consider the interior of the set for 
which f = g.) 


8. Suppose f is analytic on the extended complex plane, except for isolated points, 
ie., f (1/z) is also analytic at 0. 


(a) Show that f has a finite number of zeros and poles (except for f = 0), 
(b) Using polynomials p, g whose roots are these zeros and poles, respectively, 
deduce that f is a rational function p/q. 


Remarks 13.32 


1. A subalgebra must have the same unity as the algebra—it is not enough that it 
has a unity. For example, C (Exercise 13.10(2)) contains the set { (0, a): a € C} 
which is closed under addition and multiplication and has its own unity (0, 1), 
different from C’s unity (1, 0); it is an algebra, but not a subalgebra of C. Instead, 
the set { (a, 0) : a € C} is a subalgebra of C. 


2. The axiom ®1 = 1 of an algebra morphism does not follow from the other 
properties of ®. For example, the map ® : C — C defined by ®(z) := 
(0, z) satisfies all the properties of a Banach algebra morphism, except that 
®(1) = (0,1) € (1,0). But continuity of characters follows from their other 
properties (Proposition 14.34). 


3. * The proof of the embedding of ¥ into B(¥’) does not make essential use of the 


axiom ||1|| = 1, or of |lax|| < ||a||||x||. If instead, || 1|| = cand |lax|| < c’lla|| |x|], 
one gets 

lal] = Lal] <ellZall, — ILall < e'llal. 
Thus ¥ has an equivalent norm defined by |la|] := ||Za||, with |]1|| = ||7|| = 1 
and 


ley ll = Lay ll = WLeLyll < WLx Ly ll = lel ily. 


4. In the Banach algebra B(X), one can define p,(T) := limsup, Tx ||"; so 
0 < px(T) < p(T). The series ae anT"x converges absolutely when p,(T) is 
less than the radius of convergence. 


Chapter 14 
Spectral Theory 


A moment’s reflection shows that, by Cauchy’s residue theorem, the path of 
integration in f(T) = an §fORz- T)~! dz can be modified, as long as f and 
(¢ — T)~! remain analytic over the swept region. We are thus led to study the region 
where z — T is not invertible, called the spectrum of T. 


Definition 14.1 


The spectrum of an element 7 in a Banach algebra is defined as the set 
o(T) := {A €C: T — X is not invertible}. 


Its complement C \ o(T) is called the resolventof T. 


Examples 14.2 
1. o(z) = {z} (since z — A is not invertible when 4 = z). 
2. b» Recall that a square matrix A is non-invertible + A isnot 1-1 = det A=0. 


The spectrum of ann x n matrix consists of its eigenvalues, i.e., the roots of the 
characteristic polynomial equation det(T — 4) = 0 of degree n. 


: 01 00 01 
For example, the spectra of the 2 x 2 matrices ( a (‘ 0): (; 0): and 


( =) are {0}, {0}, {—1, 1}, and {a, b} respectively. 


Note that it is possible to have different elements with the same spectrum. The 
spectrum is a sort of ‘shadow’ of T — it yields important information about T, 
but need not identify it. 


3. » The spectrum of a sequence x = (a,) € €~ iso(x) = imx = {a, :n € N}. 


Proof The inverse of the sequence x —A = (a, —A) is bounded iff |a,—A| > c > O 
for alln, hence A ¢ o(x) < A is an exterior point of {a,}. 


J. Muscat, Functional Analysis, DOI: 10.1007/978-3-3 19-06728-5_ 14, 307 
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4. A spectral value of an operator T € B(X) is a complex number A for which the 
equation (JT — A)x = y is not well-posed; one sometimes sees in practice that as 
one varies a parameter A of a model, some specific values have unstable solutions 
that ‘resonate’. 


5. (a) » Translations, ‘rotations’ (in the sense of multiplication by e!?) and scaling 
of T have corresponding actions on its spectrum: 


o(T+z)=o0(7)+z, o(2T)=zo(T), 


since (T +z) -A =T—(A—z),sorA €o0(T +z) & A4-—z€ a(T); for 
240, (2T) -A=2(T —A/z), sor € o(zT) & A/zEo(T). 

(b) If T is invertible, then o(T~!) = o (T)7! := {A7!: A € o(T)}, since 
T-'-a=-aT-'(T —47!), 804 € o (T=!) & AW! € o(T) (note that 
A #0). 

(c) The matrices § := (5 ) and T := a ) show that there is no simple 
relation between o (S + T) or o (ST) and o(S) and o (7) in general. 

(d) o(ST) = o (TS) U {0} OR o (ST) = a (TS) \ {0}. 


Proof Ford # Oand ST —A invertible, (TS—A)~! = ¢(T(ST—A)!S—D), 
since 


(TS —a)(T(ST —d) 'S — 1) = T(ST A)(ST is (TS—A) =A, 
(T (ST a's 1)(TS H= TST <2) GT NS —(TS—A)=d. 


Thus, o (T'S) C o(ST) U {0}; indeed, reversing the roles of S and T shows 
o (TS) U {0} = o (ST) U {0}. 
(e) In particular, o(S~!'TS) = o(T). 


Example: Quadratic Forms 


Extracting the spectrum of matrices is one of the most useful application of 
mathematics. Quadratic forms are expressions of degree 2 in a number of variables, 
such as 


a d/2 f/2 x 
q(x, y, Z) = ax? +by?+c7?+dxy+eyz+ fzx = (xyz) [d/2 b e/2 y 
f/2 e/2 ¢ ve 


They are found in the equations of conics and quadrics, the fundamental forms 
of surface geometry, the inertia tensor and stress tensor of mechanics, the integral 
forms of number theory, the covariances of statistics, etc. They can always be written 
as q(x) = x'Ax, with A a symmetric matrix. We will see later that when the 
coefficients are real, such matrices have real eigenvalues, 41,..., Ay, and there 
exists an orthogonal matrix P such that P~! AP = D, where D consists solely of the 
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eigenvalues on the main diagonal. So the orthogonal transformationx +> ¥ := P~!x 
gives a new simplified quadratic form 


q(x) =x" Ax =X" PAPE =E"DE HAR +--+ ANd? = G(R). 


These eigenvalues are intrinsic to the quadratic form, in the sense that any rotation 
of the variables gives a quadratic form with the same spectrum, and so represent real 
information about it rather than about the choice of variables. Not surprisingly these 
values were discovered before the connection with linear algebra became clear, and 
called by a variety of names such as “principal curvatures”, “principal moments”, 
“principal component variances’’, etc., in the different contexts. For example, a conic 
that satisfies the equation ax? + bxy + cy? = | can also be represented by the 
equation AX* + jy? = 1, where (X, ¥) are obtained by a rotation/reflection of (x, y). 
Hence there are four types, depending on the signs of A, ju: ellipses, hyperbolas, 
parallel lines, or the empty set. 


14.1 The Spectral Radius 


Determining the exact spectral values of an element is usually a non-trivial problem. 
The fundamental theorem for the general case is: 


Theorem 14.3 


The spectrum of T is anon-empty compact subset of C. The largest extent 
of o(T), called the spectral radiusof T , is 


: i 
max{|A|: 4 € o(T)} = p(T) = lim ||T"||". 


Proof o(T) is compact: If |A| > p(T), then p(T/A) = p(T)/|A| < 1,soT -A= 
—AC — T/A) is invertible (Theorem 13.20). Spectral values are therefore bounded 
by p(T). 

The resolvent set is none other than f —!G(X) where f(z) := T —z, and G(X) 
is the set of invertible elements of V. Since G(4’) is open in ¥ and f is con- 
tinuous, it follows that the resolvent is open (Theorem 3.7), and the spectrum is 
closed in C. More concretely, if T — A is invertible, and z is close enough to A, then 
|z—A| = ||(T — z) — (T —A)|| implies that T —z is also invertible (Theorem13.21). 

The spectrum o(T), being a closed and bounded set in C, is compact (Corollary 
6.20). 


o(T) isnon-empty: Applying Theorem 13.26, with f(z) := 1, and acircular path 
centered at the origin with radius larger than p(T), gives 


1 -1 
l= pe-7) dz. 


Oni 
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But the map z + (z— T)—' is analytic on C\o (T) by (13.3). This would contradict 
Cauchy’s theorem (Theorem 12.16) were the spectrum empty. 


The spectral radius is p(T): Let ro be the largest extent of o(T), and consider 
the function f: zh (z- T)—!; itis analytic on C\o (T), in particular on C\B,, [0]. 
So it has a Laurent series 5°, Anz”, valid for all |z| > ro (Example 13.30(5)). On 
the other hand, we know that 


asd n 


ee z 7 
(z—T) t= 7. -T/z) ‘= Law for|z| > p(T). 


The two series must be identical, 30°24, Anz” = Dopo.) T"/ z'+1 and remain valid 
forall |z| > r..Butthe second series diverges when p(T) > lim inf, |z—"|-1/" = |z| 
by the Cauchy-Hadamard theorem, so there can be no z € C such that rg < |z| < 


p(T), in other words, rg = p(T). oO 


This is a surprising result: one might expect p(T) to depend on the specific norm 
used for a square matrix T, but the spectrum of T consists of its eigenvalues, which 
are determined by an algebraic equation. 


Corollary 14.4 Fundamental Theorem of Algebra 


Every non-constant polynomial in C has a root. 


Proof The roots of the polynomial equation 2” + a,_;z"~! +---+ a9 = 0 are 
precisely the spectral values of the matrix 


@) rare 0) —ag 
10:--0 —a 
a 

0----0 1 —ay_-1 


Examples 14.5 


1. The smallest extent of o(T) is p( T~')—! when T is invertible (otherwise it is 0). 
Thus the condition r < re ames aie < p(T) < R for a Laurent series expansion 
to exist (Theorem 13.28) can be restated as “the spectrum of T lies inside the ring 
with radii r and R”. 


2. » Every Banach division algebra is isomorphic to C (Gelfand-Mazur theorem). 


Proof A division algebra is defined as one in which the only non-invertible element 
is 0. Hence T — A is not invertible precisely when T = A € Cl. But o(T) is 
non-empty, so this must be the case for some 2. 
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3. » Every Banach algebra, except C, has non-zero topological divisors of zero. 


Proof Suppose that the only topological divisor of zero is 0. Since the spectrum 
o(T) of every T has a non-empty boundary (Proposition 5.3), there is a T — i 
which is a topological divisor of zero, so T = 4 € Cl. 


4. » Every commutative Banach algebra, except C, has non-trivial ideals. 


Proof Suppose the only ideals are {0} and 1. Then the ideal generated by T ¥ 0, 
namely VT (in a commutative algebra), must equal 7. It follows that ST = 1 
for some S € 4, and T is invertible. But the only Banach division algebra is C. 


5. A morphism J : Y — Y may only decrease the spectrum of an element, since 
a non-invertible element in ” may become invertible in Y, but an invertible in 
X cannot become non-invertible in Y. If J is an embedding, the boundary of 
the spectrum in 4, consisting of topological divisors of zero, is preserved in 
JY (Exercise 13.25(14)). The spectrum may decrease but its boundary (and the 
spectral radius) does not. 


6. Recall the commutant algebra Y := A” C X when the elements of A commute 
(Exercise 13.10(14)). By part (c) of that exercise, for any T € Y, if T — A is 
invertible in ¥ then its inverse is in VY, so oy(T) = a (T). 


Little else can be said about spectra of general elements of an algebra. The fol- 
lowing proposition shows that the spectrum o (T) depends somewhat ‘continuously’ 
on T: 


Proposition 14.6 


If 7, — T, then 


Ve >0,4N, nN => o(T,) Co(T)+ B.(0). 


Proof Let U be any open subset of C containing o(T), U 
for example o(T) + B,(0). It is claimed that for all ) 
z¢U, ||(T — 2) ||| <¢ When |z| >r > ITI, 


oo Tn ce 7 ||" 1 
-1 
I(r -2z) I=|D> sal <a at = Fla 


n=0 n=0 


while on the remaining closed and bounded set B,[0] \ U, the continuous function 
zt ||(T —z)7!|| is bounded (Corollary 6.16). If ||T — S|| < i, then when z ¢ U, 
(7 — z)71(T — S)|| < 1. This implies that 


Sz] (1 —2=—(T =] =p = (=a = 8) 
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is invertible (Theorem 13.20). Thus o($) C U, and we have shown that any open 
set that contains 0 (T) also contains a(S) for S close enough to T. 

For example, if U := o(T) + B.(0) and T,, is close enough to 7, then 
o(Tn) CU. oO 


Exercises 14.7 

1. The spectrum of (z1,...,ZN) € CN is {Z1,..., ZN}. 

2. The spectrum of f € C[0, 1] iso(f) =im(/f). 

3. Verify directly that for a matrix A with eigenvalue A, A — A is a divisor of zero. 
4 


. * Prove that o(T*) = o(T)* = {v2 : Xk € a(T)}. (We will see later a broad 
generalization of this (Theorem 14.25)). 


5. Show that o(LR) = {1}, but o(RL) = {0,1}, where LZ and R are the shift 
operators. 


6. Show that ST —TS = z £0 for S, T € & implies o (ST) is unbounded, which 
is impossible (Hint: 4 ¢ o(ST) > A+z€oa(ST)). 


7. The spectrum of ($,T) € Y x Visa(S)Ua(T). 


8. If T €¢ B(X) and S € B(Y), let TOS: Xx Y— X x Y be defined by 
T © S(x, y) := (Tx, Sy). Theno(T © S) = a(T)VUa(S). 


9. If A is a boundary point of the spectrum, then T — A is at the boundary of G(%), 
and so is a topological divisor of zero (Proposition 13.23). Moreover, if T — ju is 
invertible, then 

(2 = w) > Wd(u, o(7)). 


14.2 The Spectrum of an Operator 


An operator T on a Banach space X is invertible in B(X) when T has a continuous 
and linear inverse T~! € B(X). By the open mapping theorem, this is automatically 
true once T is bijective. So an operator T € B(X) is not invertible when one of the 
following cases holds: 


T not 1-1 


T not invertible in B(X) 


T is 1-1 but not onto 
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T is not 1-1 (ie., ker T € 0). In this case, T is a left divisor of zero as TS = 0 
for any non-zero S € B(X) withim S C ker T. 


T is 1-1, but not onto, yet it is “almost” onto, in the sense that its image is dense, 
im T = X. Here, it cannot be the case that ||Tx|| > c||x|| for all x andsomec > 0, 
otherwise im T would be closed (Example 8.13(3)) and T onto. This means that 
one can decrease || 7 x|| but keep ||x|| fixed, i.e., there are unit vectors x, such that 
Tx, — 0. By taking any unit operators withim S, = [[x,]], we get TS, — 0, so 
T is a topological left divisor of zero. 


T is 1-1, and its image is not even dense in X. In this case, by Proposition 11.18, 
there is a non-zero S € B(X) with kernel containing im T, so ST = 0, and T is 
a right divisor of zero. 


The spectrum of an operator T € B(X) thus consists of A in: 


the point spectrum o, (7), when T — A is not 1-l,ie., 7x = Ax for some x 4 0; 
we say that A is an eigenvalue and x an eigenvector of 4 (note that a non-zero 
multiple of an eigenvector is another eigenvector, so they are often taken to be of 
unit length); the subspace ker(T — 4) of eigenvectors of 4 (together with the zero 
vector) is called its eigenspace. 


the continuous spectrum 0, (T ), when T — is 1-1, not onto, butim(T — 2) = X. 


the residual spectrum o,(7), when T — A is 1-1, andim(T — A) 4 X. 


Proposition 14.8 


Eigenvectors of distinct eigenvalues are linearly independent. 


Proof Let v; #4 O be eigenvectors associated with the distinct eigenvalues 1;, 
i=1,2,...,sothat (T —A)v; = (A; — A)v;. The sum > a;v; = 0 implies 


0= (Tia) Tay) a 
= (T= Aa) (P= Anat) So ai — Ane; 


= (T —Ag)--- — An-1) >. aj (Aj — AN) Uj 
= +++ =a (Ay —Az)--- (Ar —An)u1 


forcing a; = 0. Since the argument can be repeated for any other index i, we have 
a; = 0. | 
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Proposition 14.9 


If i is a limit of eigenvalues, or is in 0, (7), or is a boundary point of o (7) 
, then 1 is an approximate eigenvalue, meaning there are unit vectors x, , 
such that 

(T — X)xn > Oasn > ov. 


Proof If Ay > A and Txpy = AnXy With ||xp|| = 1, then 
(T —A)xn = (An — A)Xn > 0. 


A is an approximate eigenvalue exactly when T — A is a topological left divisor 
of zero, because suppose there are unit operators S$, with (T — A)S, — 0. Let xp 


be vectors such that ||S,xp|| = 1 and ||x,|| < 2 (possible since ||S,|| = 1); then 
(T —A)Spxn — 0, and A is an approximate eigenvalue. 
Conversely, given (T — A)x, — 0 with x, unit vectors, let S, := x,@ for any 


og € X* with unit norm. Then ||S,|| = 1 and (T — A)S, = (T — A)xnd > 0 as 
n> o. 

This includes the case when A is at the boundary of o(7) (Proposition 13.23), 
and when A € o,(T) as we have just seen at the beginning of this section. Oo 


Examples 14.10 


1. » The spectrum of the left-shift operator L(a,) := (dn41), on €© is the unit 
closed ball. 


Proof The norm of L is 1, so 0(L) C B,[0]. To find its eigenvalues, we need to 
solve Lx = Ax for some non-zero x = (ay) € €™, i.e., 


Vn, ant1 =Aan, |an| Sc. 


This recurrence relation gives a, = "ao, satisfying |ag||A|" = |a,| < c. Thus 
the only possible candidates for eigenvalues are |A| < 1. In fact, for any such A, 
the sequence (1, A, 7,...) is an eigenvector in °°. Hence o(L) = B;[0], and 
all spectral points are eigenvalues. 


2. » The spectrum of the left-shift operator on £! is the unit closed ball. 


Proof The same analysis as in Example | applies: p(L) < ||Z|| = 1, and a, = 
"ag. This time, the condition x € €! is )), |an| = lao| >, |Al” < 00. This 
is only possible when |A| < 1. Once again, but only for |A| < 1, the sequence 
(1,A, 1, ...) is an eigenvector in £!. Still, since it is closed, bounded by 1, and 
contains B,(0), the spectrum must be the closed disk. The spectral values in 
the interior are eigenvalues, and those on the circular perimeter are approximate 
eigenvalues. 
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3. 


Let T : €2 — €? be the multiplier operator T(a,) := (bndn) where by are 
bounded. Its eigenvalues are b,, and its spectrum is K := {b, bo, ...}. 


Proof For eigenvalues, T(a,) = (bndn) = A(an), $0 (bn — A)an = O for all n. 
This implies A = b, for some n, otherwise (a,) = 0. In fact, Ten = byen, so by 
is indeed an eigenvalue. Now, suppose A is not a limit point of {b1, b2, .. .}; there 
is then a minimum positive distance between 4 and K, i.e., |A — by| > d > 0. 
So the equation (T — A)(a,) = (cn) can be inverted, a, = cy,/(by, — 2), with 
lan| < |en|/d; \|(7 —A)7~'| < 1/d. The spectrum therefore must include the 
eigenvalues and their limit points, but nothing else. 


Let T : L®[0, 1] + L™®[0, 1] be defined by Tf (x) := [aa f(s) ds. Then T is 
linear, and continuous with ||T|| < 1 since 


1 


1 
ITf= = sup | fesyas|<iflle sup f ds=ifth=. 
1—x xe[0,1] 71 


xe[0,1] —x 


For eigenvalues, we need to solve rf ‘a ,f@ dt = af(t). Differentiating twice 
gives f’ (x) + a f(x) = 0 with boundary conditions f(0) = 0 = f’(1). Thus 
the eigenvectors (or “eigenfunctions”) are f(x) = sin(x/A) with eigenvalues 
A. = 2/km,k odd. The spectrum must also include 0, because it is their limit 
point, but at this stage we cannot conclude anything further about the spectrum. 


If S$: X — Y,T : Y — X are operators, then ST and TS share the same 
non-zero eigenvalues. 


Proof If STx = rx (x € 0), then TS(Tx) = T(ST)x = A(Tx), so either 
Tx = 0, in which case 4 = 0, or Tx is an eigenvector of TS with the same 
eigenvalue A; similarly, every non-zero eigenvalue of T'S is also an eigenvalue of 
ST. (Compare with Example 14.2(5d).) 


Gershgorin’s theorem: If T = [T;;] is an operator on co, then each eigenvalue 
belongs to a disk B,[T;;] for some j, where r := Didi |Tjil. 


Proof Let x = (a;) be an eigenvector of T and let |a;| be its largest coefficient. 
Then rearranging Tx = Ax we get 


hay = D1 Tyiai = Tijaj + >) Tyiai, 
i iAzj 
2A = Tigllajl < D2 lTjillail < rlajl. 
ij 
Real eigenvalues of real operators have real eigenvectors, i.e., if X is areal Banach 


space, then T € B(X) is not guaranteed to have a spectral element, but it will have 
when considered as an operator on the complex space X + iX. Nevertheless if 
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the eigenvalue is real, with eigenvector u + iv, then u and v are also eigenvectors 
(unless 0), 
Ttut+iv) =Au+iv) > Tu=dAu,Tv=dv. 


The Spectrum of the Adjoint 


There is a relation between the eigenvalues of T' and the residual spectrum of T: 


Proposition 14.11 


o(T') =o(T) 
o,(T) © onl) cS o,(T) Uo;,(T). 
o-(T") C o-(T) 


Proof (i) T — X is invertible in B(X), if and only if, its adjoint is invertible (Exercise 
11.32(7)), 
fai Sqr aay, 
Soradgoa(T) & A ¢a(T"). 
(ii) By definition, A € ear) when there is a @ # 0 in X™* such that 


go(T-A =(T' —-AG=0. 


This implies there is an x € X, dx # 0, so that x ¢ im(T — 1). In turn, if x € 
X \im(T — d) exists, then there is a @ 4 0 such that ¢(T — 4) = 0 (Proposition 
11.18), and we have proved 


dN €0,(T') > im(T —dA) 4X. 


This condition is certainly satisfied when A is a residual spectral value of o (T), but 
not when it is in the continuous spectrum of 7, so 


NE 0,(T) > A€0,(T') > A €o;-(T). 


(iii) When A‘ is 1-1 but im AT = X%*, then we can infer, by Proposition 11.30, 
that (a) (ker A) D im AT = X*, so A is 1-1; and (b) (im A)+ = ker A’ = 0, so 
im A = X. Applying this to A := T — A when a € o,(T'), we find that T — A is 
1-1 and has a dense image, that is, A € o;(T). oO 


Examples 14.12 


1. When T'' = T (e.g. on a Hilbert space) then 0,(T") € op(T) as well as 
o-(T') =0,(T). 
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2. Inco (or £7), the left-shift and right-shift operators have 


op(L) = B\(0), o,(L) = 2, o-(L) = st, 
aAR=S, o-(R)= BO), o,(R)=8'. 


Proof That o,)(L") = © has already been shown since L" is the right shift on 
é!; in the same way can be proved op»(L) = B,(0). Applying this proposition, 
we find that o,(L) € op)(L") = @, leaving o,(L) = S!. 


Similarly for R, o-(R) C op(R') € a;(R) since op(R) = © (prove!), hence 
6(R) = 0, (R= Bi O) ando(R) = 8". 


Exercises 14.13 


1. Show that the right-shift operator R (on £~ or ¢') has no eigenvalues. 


2. The right-shift operator R € B(¢') and its adjoint L € B(€~) have spectra 


o(L) = op(L) = B,[0] = o,(R) = 0 (R). 


3. The spectrum of L on £!(Z) is the circle S!. This is an example of the hollowing 
out of a spectrum when the algebra increases, in this case when £! is embedded 
in €!(Z). 


4. The operator T (do, a1, ...) := (ao, 0, a1, a2, ...), on co, has a single eigenvalue 
1, but its adjoint has op(T") = B,(0)U {1}. Deduce that op) (T) = {1}, 0,(T) = 
B,(0), and o,(T) = S! \ {1}. 


But the same operator restricted to £! has a single eigenvalue | and no continuous 
spectrum. 


5. The operator T (ap, a1,...) := (do, 0, ay, 42/2, a3/3,...), on cg, has a single 
eigenvalue 1, and its adjoint has two eigenvalues, | and 0. 


6. The spectrum of the multiplier operator Tx := ax, on ¢*, has no residual spec- 
trum. 


7. The spectrum of xf € B(X), where x € X and @ € X%, consists of the 
eigenvalues @x and 0 (unless X is 1-dimensional). 


8. LetT : X — Y,S:Y — X beoperators and consider R € B(X x Y) defined by 
R(x, y) := (Sy, Tx); the ‘matrix’ form of R looks like (5 5): Then non-zero 


eigenvalues of R come in pairs +A. (Hint: consider (x, —y).) 


9. Let T : C[O, 1] ~ C[O, 1] be defined by Tf (x) := xf (x). Show that T is linear 
and continuous, find its norm and show that its spectrum is the line [0, 1] in C, 
consisting of only the residual part. 

More generally the spectrum of Tf := gf in C[O, 1], where g € C[O, 1], is 
im g. 
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The reader is encouraged to explore the spectrum of this operator in other spaces, 
such as L![0, 1] or L7[0, 1]. 


10. * Let V : C[0O, 1] — C[O, 1] be the Volterra operator Vf (x) := i f. Show 
that 


vetleay at [Po-y"foyd 
= * i y y) ay, 


and that ||V” || < 1/n!. Deduce, using the spectral radius formula, that its spec- 
trum is just {0}. Show that 0 is not an eigenvalue (hint: differentiate!) but a 
residual boundary spectral value. 


11. Find the eigenvalues of Tf (x) := is xy? f(y) dy on C[O, 1]. 


12. The spectrum of an isometry T lies in B,[0]. Any eigenvalues or approximate 
eigenvalues lie in e'R If T is an invertible isometry, theno (7) C e! R. otherwise 
the spectrum must be the whole closed unit disk (e.g. the right-shift operator). 
(Hint: T — 27 = T(1 —AS).) 

13. Show that the set {T € B(X) : T isl — 1 and has a closed image} is open in 
B(X). (Hint: Proposition 11.3.) 


14.3 Spectra of Compact Operators 


Ascents and Descents 


For any operator, the eigenspace associated with an eigenvalue A is ker(T — A). 


But this is not the whole story: for example, T := ¢ i) has just one eigenvalue, 


and a one-dimensional eigenspace generated by (i); the vector v := (i) is mapped 
by T to (4), and only a second application of T kills it off. We can think of it as 
a “generalized” eigenvector, with (T — 4)*v = 0. In general, one can consider the 
spaces of vectors that vanish when (T — A)” is applied to them. Two nested sequences 
of spaces can be formed (here shown for A = 0), 


e an ascending sequence 


OCkerT CkerT? C--»CkerT" C++» C | )ker7”, 


n 


e a descending sequence 


XDimT QimT’ S33 SimT" Sea] )imT". 
n 
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Finite Ascents and Descents 
Suppose there is an n such that ker 7” = ker T"*! ie., for all x, 
Pest eS T= 0. 
Substituting Tx instead of x gives 
P=) = re Si 


and ker T+? = ker T”+! = ker T”. By induction, all the subsequent spaces in the 
ascending sequence are identical, ker 7”+* = ker T”. Operators with this property 
are said to have a finite ascent up ton,0 C kerT C--- C kerT”. 

Similarly, if im T” = im 7”*! then for any x € im T”*!, 


x= perl = T(T™y) = Tr zy) = Tmt, e€im Tmt 
By induction, im 7”** = im 7’. Operators with this property are said to have a 
finite descent down to m. 


Proposition 14.14 


An operator T has 


(i) finite ascent up to at most n © imT” Nker T* = 0, Yk, 
(ii) finite descent down to at most m = X =ker7T” +im T*, Vk, 
(iii) finite ascent up to n and descent down tom => m =n and 


X =kerT” @imT"”. 


Proof (i) If im T" Nker T = 0, then T’t!x =0 3 T"x € imT” NkerT = 0, 
and T has finite ascent up to at most n. 

For the converse, let x € imT” MN ker TS, that is, x = T”y and T*x = 0. Then 
T"+ky — Oand y € ker T"+* = ker T"; sox =T"y =0. 

(ii) Let x € X, then Tx = 7T"tly =... = T**z, assuming finite descent to m. 
So T" (x — Tkz) = Oand x = Tkz4+ (x — T*z) € im T* + ker T”. 

Conversely, if X = im T+ker T”, then for any x = Ty +z, wehave T"x = T”t!y 
andim T” = imT”*!, 


(iii) Suppose im T” = im 7"*!, but ker 7” C ker T”*!. Then there is an x; such 
that T?? 4, =O but 


OAT"x, = pete, = py, =: 
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so xz € ker T?t# \ ker T"+k-1 and T has an infinite ascent. This shows that a finite 
ascent cannot be longer than the descent. 


Next suppose the ascent goes up to ker 7” = ker T”*! but the descent goes down to 
im T” =imT”*! with m > n. Then for any x € X, there is a y such that 


T"x~= petty => T™ (x —Ty)=0 
=> x—TyékerT” =kerT” 
=> T"x = qty 


so a finite descent cannot be longer than the ascent. 
Combining the results of (i) and (ii) gives X = ker T” @imT"”. oO 


Proposition 14.15 (Fredholm Alternative) 


A Fredholm operator T with 


(i) finite ascent, satisfies index(T) < 0, 
(ii) escent, satisfiesindex(T) > 0, 
(iii) scent and descent, satisfiesindex(7) = 0 and 


T is 1-1 < T is onto. 


Proof Recall that the codimension of a closed subspace Y C X is defined as 
dim(X/Y), that Fredholm operators have finite-dimensional kernels and finite codi- 
mensional images, and index(7) = dim ker T — codim im T (Definition 11.12). For 
T with finite ascent to n, by the index theorem, 


0 < codimim T* = dimker T* — index(T*) 
= dimker T” — kindex(T), fork >n, 


Since k can be arbitrarily large, it must be the case that index(T) < 0. 
For Fredholm operators with finite descent to m, 


0 < dimker T* = codimim T* + index(T*) 
= codimim T™” + kindex(T) fork > m. 


This time, we must have index(7) > 0. 

A special case is when m = n = 0, knownas the Fredholm alternative: ker T = 0 
if, and only if, im T = X,i.e., T is 1-1 < T is onto; in other words, T is either 
invertible or it is neither 1—1 nor onto. oO 
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Ivar Fredholm (1866-1927) studied p.d.e.s under Mittag-Leffler 
in 1893 at the new University of Stockholm; he saw the con- 
nection between Volterra’s equation and potential theory, espe- 
cially in 1899 while working on Dirichlet’s problem; in 1903 
he analyzed the theory of general integral equations f(a) — 
aN f k(x, y) f(y) dy = g(a) covering much that was then known 
about boundary value problems (mostly self-adjoint), proved 
the Fredholm alternative and defined the Fredholm determi- 
nant det(1— K) = e7 Xn nttK" He was then ‘distracted’ by 
actuarial science and government. 


Fig. 14.1 Fredholm 


Examples 14.16 


1. The spaces M := imT™ and N := ker T” are both T-invariant and such that 
T | is an isomorphism while T |, is nilpotent. 


2. For matrices, the Fredholm alternative boils down to the statement that either 
Ax = b has a unique solution or Ax = 0 has non-trivial solutions. 


3. The Fredholm alternative only applies to (Fredholm) operators with finite ascent 
and descent; e.g. the right-shift operator is 1—1 but not onto. 


4. If T is Fredholm with finite ascent and descent, then dim ker T = dim ker T' 
(Exercise 11.32(10)). 


The Spectrum of a Compact Operator 


The following two results are peaks in the landscape of Operator Theory. 


Proposition 14.17 


Let T : X — X becompact ona Banach space X , then / — T isa Fredholm 
operator with finite ascent and descent. 


Proof I — T is Fredholm since it is invertible up to the compact operator T 
(Proposition 11.14). 

Suppose S := J — T has infinite ascent, so ker S”~! C ker S”. By Riesz’s lemma 
(Proposition 8.20), choose unit vectors x, € ker S” with ||x, + ker S’~!|| > 5. Then 
form <n, 


|F xn — TXm|| = Qn — Xm) — Sn — Xm)Il 2 


Nile 


since S”—! (xm + S(%n — Xm)) = 0. So (TxXn) has no Cauchy subsequence, contra- 
dicting the compactness of T. 
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Suppose S has infinite descent, with im S’~! D im S”. One can choose unit 
vectors x, € im S” with ||x, + im S"*!]| > 5. Then for m > n, 


|7 xn — Txm|| = On — Xm) — SQn — Xm) || S 


Nle 


since Xm + S(X%, — Xm) € im S$” td. Again this would contradict the hypothesis. 
It follows from the propositions above, that the index of T vanishes and 
dim ker($") = dim ker S. Oo 


Theorem 14.18 Riesz-Schauder 


If T € B(X) is compact, then 


(i) its spectrum o(T) is a countable set, whose only possible limit point 
may be 0, 
(ii) each non-zero i € o(T) is an eigenvalue with a finite-dimensional 
eigenspace ker(T — i), 
(iii) 7' has the same non-zero eigenvalues and eigenspace dimensions as 
T. 


Proof For’ 4 0,T —2’ = AU — T/A) is a Fredholm operator with finite ascent and 
descent, so its kernel is finite dimensional and it satisfies the Fredholm alternative, 
namely it is either invertible (A ¢ o(T)) or not 1-1 (A is an eigenvalue). T — A has 
index 0, so 7" has the same number of eigenvectors of A as T, 


dim ker(T' — 4) = dimim(T — ie = codim im(T — A) = dimker(T — 4). 


Consider those eigenvalues 4 for which |A| > € > 0. Taking any list of them, A, 
(distinct), choose a unit eigenvector e, for each, such that ||e, + [[e1,..-, @n—1]]|| = 
5 (Propositions 8.20 and 14.8). Hence, taking n > m, say, 


Xr 1 € 
~—emll 2 5 An ZS 5 


n 


|Z en — Tem = l|Anen — Amem|| = |An\llen 


Now the bounded set {e), e2, ...} is mapped to {Te1, Tez, ...}. If the first set is 
infinite, the latter set would have no Cauchy subsequence, contradicting the com- 
pactness of T. So the number of such eigenvectors, and corresponding eigenvalues, 
is finite. The rest of the eigenvalues must be within € of 0. By taking e = 1/n — 0, 
it follows that the number of non-zero eigenvalues is countable. Oo 


To clarify, in finite dimensions, the set of eigenvalues is finite and need not include 
0, but in infinite dimensions, 0 must be part of the spectrum (else J = T~!T is 
compact). If there is a infinite sequence of non-zero eigenvalues, then A, — 0, and 
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0 is an approximate eigenvalue. What remains to complete the theory is to find the 
form of T on each generalized eigenspace. 


Proposition 14.19 Jordan Canonical Form 


On each finite-dimensional space ker(7 —i)” (A 4 0) ofa compact operator 
T ona Banach space X, there is a matrix of 7 consisting of blocks on the 
main diagonal, each of the type 


S 
S 


Proof The operator T can be split as A + (T — A). The latter is nilpotent on the 
subspace ker(T — A)” (finite dimensional since (JT — A)” is Fredholm), while AJ is 
diagonal. This is the claimed Jordan form, once it is shown that a nilpotent operator 
has the following form. 

A nilpotent operator on a finite-dimensional space can be represented by a matrix 
of Os except for Is and Os in the super-diagonal: Suppose A is a nilpotent operator 
of order NV, AN = 0; it has a descending sequence down to N, and an ascending 
sequence up to N,0 C ker A C --: C ker AN. For each non-zero vector AN~!u € 


im AN—! there is a sequence of vectors e; := AN- lu, ey = AN~72u, ..., en =U. 
They are linearly independent because e; € ker A’ \ ker A‘—!, so to have @m € 
le1,.--,@m—1] © ker A”! is impossible. Since Ae; = ej—; and Ae; = 0, the 


matrix of A restricted to the space generated by these vectors is 


0. 1, Oes<0 
0 
i) 
Qe sere 
A remains nilpotent on the rest of the space ker AY /[[e,..., en], with perhaps 


a lower order. The same argument can be repeated to yield other sets of independent 
vectors. As X = ker AN is finite-dimensional, this process ends with a finite basis 
for X and the matrix of A with respect to it consists of such blocks placed on the 
diagonal. Oo 


Examples 14.20 


1. The total number of As in a Jordan matrix, called its algebraic multiplicity, is the 
dimension of ker(T — A), the largest generalized eigenspace. The number of 
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Jordan blocks associated with A is dim ker(T — i), called the geometric multi- 
plicity of 4. The size of the largest Jordan block is sometimes called its (Jordan) 
index. For example, the matrix to the right has an eigenvalue 2 with algebraic 
multiplicity 4, geometric multiplicity 2, and index 3; the other eigenvalue 3 has 
algebraic multiplicity 2, geometric multiplicity 1, and index 2. 


Noe 
Noe 


. The set of N x N matrices with distinct eigenvalues is dense and open in B(C). 


Proof Suppose a matrix A has the Jordan-form matrix A = D + C where D is 
diagonal with the eigenvalues 41, ..., A, and C is nilpotent. Alter each eigenvalue 
slightly so 4’ are all distinct and let AY := D’+C; then || A’ — Al] = ||D’ — D|| = 
max; |A; — A;| < €. 

Because of this, the Jordan canonical form of a numerical matrix is impossible to 
calculate, due to the limited accuracy of the matrix coefficients; small changes in 
the coefficients result in a diagonal Jordan matrix with distinct eigenvalues. 


Exercises 14.21 In these exercises, let K be a compact operator on a Banach 
space X. 


1. 


When T is 1-1, the ascending sequence of spaces are all 0. 


When T is onto, the descending sequence of spaces are all X. 
For the matrix ¢ a the ascending and descending sequences are the same. 


The left-shift operator L is onto and has an infinite ascending sequence; R is 
1-1 and has an infinite descending sequence. 

The operator f(x) xf (x) acting on C[O, 1], is 1-1, and also has an infinite 
descending sequence, e.g. each of the functions 1, x, x”, ... belongs toa different 
image space. 

If T has a finite descent then 7" has a finite ascent. 


(a) Suppose that ker 7” C im T for some n. Show that Tx =0 > x = T?z 
so 
ker 7’! CimT?’,..., ker T CimT”. 


(b) Suppose ker T C im T” for some n, then x € ker T*? => x —Ty €kerT 
for some y, so 


ker T? CimT”"~!,..., ker T” C imT. 


14. 


6. 
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There is an eigenvalue at the spectral radius of K, except possibly when this 
is 0. 


. In é!, the multiplier map M(a,,) := (Cyan) is compact when c, — 0; its eigen- 


values are c,. 0 is part of the continuous spectrum, unless it is an eigenvalue. 
For example, take c, := 1/n (and co := 1), and the shift operators L and R; 
then ML is also compact but has no eigenvalues except 0; RM is compact with 
no eigenvalues at all but 0 is part of the residual spectrum. 


. (The original Fredholm alternative) For X € 0, either (K — A)x = y has a 


unique solution for each y or K 'y = Ay has a non-trivial solution. 


. The minimal polynomial of each Jordan block is (z — 4)”. 


10. 


Cayley-Hamilton theorem: If p is the characteristic polynomial of a matrix T, 
then p(T) = 0. (Hint: consider the characteristic polynomial of each Jordan 
block.) 


14.4 The Functional Calculus 


The previous definition of f(T) in Taylor’s theorem can be extended to functions that 
are analytic on the spectrum of 7, since, by Cauchy’s theorem, the path of integration 
can be swept over analytic regions of f and (z — T)~!. 


Definition 14.22 


on 


For any function f : C — C which is analytic in a neighborhood of o (7), let 


1 
fT) = aa f LOC —T)"' dz, 
TTI 


where the path of integration is taken along simple closed curves enclosing 
o(T) ina direction which keeps o (T) to its left. 


Note that the integral is defined since f(z) and ||(z — T)~!|| are continuous in z 
the selected compact path; hence 


1 
IFPI S = | irene - T) "||| ds < 00. 
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Examples 14.23 
1. » If 7S = SR then f(T)S = Sf(R) when f is analytic on a neighborhood of 
o(T) Uoa(R), since 
S(z— R) =(z-T)S 
“(@-T)'S=S@-R)" 


~f()S = ¢ f@W—T)"'Sdz = f F@S(e — RY" dz = SPCR). 


In particular 
(a) f(S~!TS) = S~! f(T)S; for example, eS ‘75 = S-leTS. 
(b) ST = TS implies f(T)S = Sf(T) and f(T)g(S) = g(S) f(T). 


2. If f € C®(o(T)) is zero ona (T), it does not follow that f(T) = 0, because f(T) 
01 


is defined in terms of a path-integral just outside o (T). For example, T := 00 


has o (T) = {0}, and f(z) := z vanishes there, yet f(T) = T £0. 


3. * f is differentiable (and continuous) at T: for H sufficiently small, f(T + H) 
is defined since o(T + H) C o(T) + B.(O) (Proposition 14.6), and 


f(T +H) = f(T) + _ ¢ f(w)(w — T)'H(w — T)~| dw + o(A). 


The next theorem proves that all algebraic properties of a complex function are 
mirrored by properties of f(T). 


Theorem 14.24 The Functional Calculus 


Given T € 1, the map f + f(T), C°(o(T)) — 4, satisfies 


(f+ 9)T)=f1)+g7), ANT) =Aaf(T), 
(Ff9M=fMgT), UT)=1, 
fog) = fg(T)), 
tn > fin C(o(T) + B.(0)) Ge > 0) S fx(T) > f(T) in. 


Proof We have already seen part of this theorem in action when analyzing power 
series. In particular, the cases 1 = sh ¢(z —T)-!dz and T~! = sb gzt(- 
T)7} dz were covered (Example 13.30(2)). 

G(f + g(T) = f(T) + g(T) and Af)(T) = Af (T) express the linearity 
property of the integral. 
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Gi) (fg)(T) = f(T)g(T): We require the identity 
G-WE=T) WT) Sb Ty He Ty, 

which follows easily from z — w = (z — T) — (w — T). In the following analysis, 


consider two paths around o(T), one (with variable z) nested inside another (with 
variable w). 


IMGT) = = og f(z)g(w)(z — T)'(w — T)! dz dw 
aon 


wat), Gat 
= Garin f f fone (+ Sa) eee 


1 4.4 4 
= afore —T) mid f(z —w) dzdw 


+ so¢foe-n's. - f gtw)(w ~ 2)! dwdz 


= 55 § foe Ted 
JT 1 
(fg)(T) 


where we have changed the order of integration in the third line, and used the fact 
that (w — z)~! leaves a residue when integrated on the outer path, but not when 
integrated on the inner path (because the singularity at w would then be outside the 
path of integration). 

In particular, note that if f is invertible on a neighborhood of o(T), 


1 
fy = <b fee TY Naz. (14.1) 


1 
Gii) f(g(T)) = af fMz- g(Tyy* dz, where the right part of the integrand 
i 


1 
is (z — g(T))! = ae (z — g(w))7!(w — T)~! dw by (14.1). Combining the 


two and using Cauchy’s integral formula (Proposition 12.19), we get 
1 -1 -1 
f(g) = Grip FR)G— g(w)) (w— T)™ dw dz, 


= oop o¢ f(2(z — g(w)) | dz (w — T)“' dw, 
i. saps 0 g(w)(w — T) "dw, 
= fog(T). 
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Note that f has to be analytic on o(g(T)) and go (T) for f(g(T)) and f o g(T) 
to be defined, but the two sets are equal by the next theorem (which only uses part 
(ii) of this theorem). 


(iv) The mapping is continuous, since ||(z — T)~!|| is bounded by some constant c 
on the compact path enclosing the open set U := o (T) + B,(0): 


1 
If(T) — gD) < =< If (z) — g(2II(z — T)~' I ds 
<cllf —gllew): Oo 


Theorem 14.25 Spectral Mapping Theorem 


The spectrum of f (7) is equal to the set { f(A) : A € o(T)}, that is, 


o(f(T)) = fo) 


Proof For any f analytic in a neighborhood of o (T): 


GA¢é fo(T) > 4 €o(f(T)): Leta & f(z) for all z € o(T); since fo(T) is 
a closed set, there is a minimum distance between A and fo(T). So (f(z) — iy? 
is analytic on o(T) + B.(O) if € is small enough, and by the functional calculus 
(f(T) - A)! exists. Thus F(T) — dis invertible. 


Gi) f(T) — f(a) invertible => T — invertible: if f(T) — f(A) has an inverse S, 
we see from rewriting f(z) — f(A) = (z — 4) F(z), and the functional calculus, that 


(Ter 1S srr =H 


which implies that the factor T — A itself is invertible. This is justified once it is 
shown that F(z) is analytic about o (T); this is apparent when z ¢ A, but even so, 


1 
f= fat fAZ-A+ sf" OE Aa? +o(z— a)’, 


— fa 1 
=> F@)= POTTS = f'Aa)+ sf OE =A) +O —A), 
meaning F is analytic at A. Oo 
Examples 14.26 


1. log T can be defined whenever there is a path, or “branch”, connecting 0 to oo 
without meeting o (7), because in this case, log z can be defined and is analytic on 
o(T). But note that log z, and consequently log 7, depends on the actual branch 
used. 


When defined, e!°£ = T. Such elements must be in G; (Proposition 13.24). 
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2. Similarly one can define T¢ := e@!°87 (again not uniquely); then (7!/")" = T 
(n = 1,2,...),and T¢*? = T¢T° (at least for a, b real). By the spectral mapping 
theorem, o(7%) = p(T) fora > 0. 


3. If T satisfies a polynomial p(T) = 0, then o(7T) consists of the roots of the 
minimal polynomial of 7 (Example 13.3(12)). 


Proof The spectral theorem shows that p(o(T)) = 0, i.e., that the spectrum 
consists of roots of p. Conversely, if A is a root of the minimal polynomial, 
p(A) = 0, then p(z) = (z — A)"q(z), so 0 = p(T) = (T — d)"q(T), where 
q(T) #€ 0 and thus T — A is not invertible. 


4. » If Xd is an eigenvalue of T € B(X) then f(A) is an eigenvalue of f(T), with 
the same eigenvector. 


Proof When Tx = ix, then (z — T)x = (z —A)x and (z — T)~1x = (z—A)7!x 
(z € o(T)), so 


f(T)x = af fOR—-TY 'xdz= aa f foe —A) lx dz = fx. 
201 2ni 


Conversely suppose f(T) — f(A) is not 1-1. Take an open neighborhood U > 
o(T) in which f is analytic. Then, either f is constant on U, or else there are 
only a finite number of A; € o(T) satisfying f(A;) = f(A). So, for z € U, 
f(a — fQ) = (-A1) +++ (Z— Ag) g(z) (where multiple roots are repeated) with 
g analytic and non-zero on U, and consequently 


f(T) — FQ) = TF — Ai) ++» F — Ang). 


But f(T) — f(A) is not 1-1, so there must be a A; such that T — A; is not 1-1 
(g(T) is invertible), and f(A;) = f(A). 


Proposition 14.27 


If o (7) disconnects into two closed sets 0; Uo2, each surrounded by simple 
closed paths in open neighborhoods of them, then 


(Gi) T =TP,+T Potextbf, with P;, P (called spectral idempotents) such 
that 1 = P,; + Po, P; Pj = Sijs 
(ii) In the reduced algebras PV P,, P+ Pz respectively, 


o(TP})=01, o(T Pr) =0%. 


Proof The disjoint closed sets o; and o2 can be separated by disjoint open sets Uj, 
U2 (Exercise 5.7(5)). Consider the functions x; (i = 1, 2) which take the constant 
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value | on one open set Uj D oj, and 0 on the other. They are analytic on Uj U U2, 
so we can define 


1 
P; := x(T) = af (z- T)~' dz. 
201 Jo; 


The path of integration is the union of the two paths surrounding oj and o2, but one 
of the two integrals vanishes. 

P; are idempotents, P; P2 = 0, and P; + P2 = 1, because x? = Xi, X1xX2 = O 
and x; + x2 = lonU,; UU2 Do(T). 

Let fi(z) := zxi(z); then f(T) = TP; and o(fi(T)) = filo (T)) = 0 U {0}. 
However, if we restrict to the reduced algebra P; V P;, with unity P;, this changes 
slightly. Since z — A is invertible in C®(o;) if, and only if, A ¢ oj, it follows that 
there exists an S such that S(T — A)P; = P; = (T — A)SP; whenever A ¢ 0;; this 
means that (JT — A) P; is invertible in P; ¥ P;. Thus, o (T P;) = 0; in this algebra. O 


Examples 14.28 


1. » When the algebra is B(X), P; are projections, and the spectral decomposition 
of an operator T into T P; and T P2 also gives a decomposition of X = X; ® X2 
where X; = im P; are T-invariant, and o (T|x,) = o;. (Theorem 13.8(11)) 


2. If 0 is an isolated point of o(T), with spectral idempotent P, then there is a 
Laurent expansion 


@=Ty'P=}j Pr! ATP Osi Pr +eo., 


3. If0 ¢ oj, then P} = T (4; $4) Gry" dz). For example, when T is acompact 


201i 
operator and 1 ¥ 0 is an isolated point of o (7), then the projection P, is also 
compact, confirming that the eigenspace of A is finite-dimensional. 


Exercises 14.29 


1. The non-trivial idempotents have spectrum { 0, 1 }, and the nilpotents have spec- 
trum {0}. What can the spectrum of a cyclic element be? 


2. If f takes the value 0 inside o (T) then f(T) is not invertible. 


3. Use the spectral mapping theorem to show that if e? = 1 then o(T) C 2ziZ. If 
P is an idempotent, then e2tiP — 1. 


4. If J isa Banach algebra morphism, then f(J(7)) = J(f(T)) (recall o (J(T)) © 
o(T)). 


5. Show directly that the matrix ¢ 0) has no square root at all. 


The shift operators on £7, say, cannot have a square root because their spectrum 
encloses 0 (even on £!(Z) when L and R are invertible). Prove this directly by 
showing the contradictions 
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(a) if 7* = L, then T mustbe onto andker T = ker L = [leo], soe9 = aTeo = 
0; 

(b) if T* = R, then T is 1-1, and im T = imR, so TRx = RTx = 
(0, 0, YO; -- .). 


6. A simple linear electronic circuit with feedback can be modeled as an operator, 
transforming an input signal x = (x,) to an output signal y = (y,,) such that 


Yn = bXn — A1Yn—-1 — ++ — Yar: 
where b, a; are parameters determined by the circuit. Equivalently, 
(lt+tajR+---+a,R')y = bx, 


where R is the right-shift operator. To avoid the once-familiar feedback loop 
instability, itis desired that the values y, do not grow of their own accord, meaning 
that 1+ a;R+---+a,R’ has a continuous inverse. This is the case when the 
roots of the polynomial | + a;z +---+a,;z’ all have magnitude greater than 1. 
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Quasinilpotents and the Radical 


Definition 14.30 


The quasinilpotents are those elements Q with p(Q) = 0. The (Jacobson) 
radical 7 of X is 


GN Olen Vie 0 0) 0 


A Banach algebra with a trivial radical is called semi-primitive or semi-simple. 


The next proposition shows that the radical is a closed ideal, which can be factored 
out to leave a semi-simple Banach algebra. 


Examples 14.31 
1. The prime examples of quasinilpotents are the nilpotents, defined as those ele- 


ments which satisfy Q” = 0 for some n, so p(Q) < QO" ||'/" = 0; eg. ( a 


2. Every operator Tf (x) := fj k(x, y) f(y) dy on C[0, 1], where k € L©[0, 1]?, 
is a quasinilpotent. 
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Proof |Tf(x)| < i Ik(x, WIL fO)| dy < |IAIII| fllx. By induction one can con- 
clude |T” f(x)| < |III" || flx"/nt, 


x 
7 e= if k(x, y)T" f(y) dy! 
0 
: 1 
< i} I" If lly" /at dy 
< eI" fll" /@ + D! 
80 ||T"|| < [Ikl" /n! and p(T) < |7"|'/" < ||kl|/Vn! — 0. 
a _ 01 
3. The sum and product of quasinilpotents need not be quasinilpotents, e.g. 00 
00 
and 6 ae 
4. The quasinilpotents are topological divisors of zero since their spectrum is a 


boundary point. Idempotents (except 0 and 1) are divisors of zero but not quasi- 
nilpotents. 


divisors 
of zero 


topological divisors of zero 


5. Radical elements are obviously quasinilpotents, 0(Q) = p(1Q) = 0. 
6. It is enough to show that 1 ¢ o (TQ) for all T, in order that Q € 7. 

Proof For anyi 40,1 €o(TQ/A) =o0(TQ)/’ > A ¢o(TQ). 
7. » ForanyTe€ ¥,Q€ J,0(T+ Q)=<a(T). 


Proof For any invertible S, the sum S + Q = S(1 + S~'Q) is also invertible, 
since rig Q) = 0 (Theorem 13.20). Thus 


A¢o(T+Q) & T+ Q-—Aisinvertible = T —Aisinvertible = 1 ¢o0(T). 


8. B(X) has nilpotents (except for X = C) but only a trivial radical. 
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Proof For any Q # O, an operator T can be found such that | — TQ is non- 
invertible, so 1 € o(T Q). One such operator is T := x@, where Ox 4 0,¢ € X*, 
¢Qx = 1; then (1 —-TQ)x =x —xf@Qx =0 but x £0. 


Proposition 14.32 


The radical is a closed ideal. 


Proof J is contained in every maximal left-ideal: Recall that a maximal left-ideal 
is closed and that every proper left-ideal can be enlarged to a maximal left-ideal 
(Examples 13.5(7,8)). Let Q € 7, and let M be a maximal left-ideal. Then M+ 7 Q 
is a left-ideal which contains M. Either 


(a) M+ XQ = 4, in which case 1 = R+7Q forsome R € M,T € 4X, so that 
R =1-—T Q's invertible, contradicting R € M (Example 13.5(5)); or else, 


(bt) M+ XQ = M, in which case Q=04+1Q€M. 


Thus 7 C M as required; an analogous argument shows that 7 is contained in 
every maximal right-ideal. 


J is the intersection of the maximal left-ideals: Let P be an element that is 
contained in every maximal left-ideal. For any T € 4, the left-ideal V(1 — TP) 
cannot be proper, otherwise it would lie inside some maximal left-ideal M, forcing 
PeM,andTP € M,andsol=TP+(1—TP) € M, a contradiction. Hence 
X(1 — TP) = 4X, and there is an S such that S(1 — TP) = 1. 

To show | — T P is invertible we need to prove (1 — T P)S = 1 as well. To this 
end one can substitute —ST for T in the above argument, to conclude that there is 
an R € & such that 


1=ROU+STP)=R(S+1—-—SQ—-—TP))=RS. 
But RS = 1 = S(1 — TP) implies | - TP = S~! is invertible. With 1 ¢ o(TP) 
for any T, P must be in the radical. 


J is a closed ideal: Being the intersection of closed sets, 7 is also closed (Propo- 
sition 2.18). For any S,T €¢ X and Q, O' € J, 
(a) p(STQ) =0= p(SQT),soTQ, OT € JT, 
(b) o(T(Q4+ Q’)) = o (TQ) = {0} from Example 14.31(7) above (T Q’ € 7), so 
D+ Oe 7; 
(c) p(T(AQ)) = |Alp(TQ) = 0, sorAQE TJ, 


and J is an ideal. oO 
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The State Space 


Definition 14.33 


The state space of a Banach algebra is the set of functionals 


S(X) = {he #*: 61 =1= |G}. 


We often write S for S(¥) and S(T) := {@T € C: ¢ € S(4X) }, for example, 
S(1) = {1}. 


Proposition 14.34 


The state space S(.V) is a convex set containing the character space A(). 
For any T € Y, S(T) is a compact convex subset of C, and 


A(T) C o(T) ¢ S(T). 


Proof (i) S(X) and S(T) are convex: For ¢, wv € S andO0 <t < 1, 


(to+(—)yv)l=t+1—1=1, and 
Ito +A -HvI < tol +a-—oOl|lv] =1. 


It follows from t¢T + 1 —tH)WT = (t6+C —t)w)T € S(T) that S(T) is convex. 
S(T) is compact: S(T) is bounded since |@T| < ||7'|| for any ¢ € S. Now recall 

that every bounded sequence in 1’* has a weak*-convergent subsequence (Theorem 

11.40 for ¥ separable. So whenever ¢, T € S(T) converges to a limit point z, there 

is a subsequence of ¢,, that converges in the weak* sense, @,, + @ € **, implying 

(a) on, IT > $T =zand l= @¢,,1 > ¢1, 

(b) |||] < lim inf; ||@,, || = 1 (Corollary 11.35). 


Hence ¢ € S and z € S(T), that is, S(T) is closed and bounded. 


(ii) o(T) C S(T): If S € & is not invertible, then 1 ¢ [|S]; indeed d(1, [S]) = 1 
as [[S]] contains no invertible elements (Theorem 13.20). So by the Hahn-Banach 
theorem, there is a @ € A™ satisfying ¢1 = 1 = ||@|| and PS = O (Proposition 
11.18). In particular, for S = T — A, where X € o(T), there is a @ € S such that 


0=¢(T—-A)=4T-A, 


sox=@T € S(T). 
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Gii) A(T) C o(T): Recall that any character yy € A maps invertible elements 
to invertible complex numbers (Example 13.7(1)), including yl = 1. So for any 
A€o(T), WT -A=W(T —dA) £0, andd ¢ A(T). Equivalently, A(T) C o(T) 
and |W7T| < p(T) < ||T||. This means that w& is automatically continuous with 
lvl] =1,andsoA CS. Oo 


Examples 14.35 


1. » The characters of @! are of the type W(an) = peur anz", where |z| < 1 
depends on yy. 


Proof Let y € A C ¢'* = £™ (Proposition 9.6); then every sequence in ¢! can 
be written as 


Co Co 
x = (40,4,...) = Sy anen = Dane #1), 
—_——’ 
n=0 n=0 i 
lo.@) 


Wx = Save Kee keV) = > Gx" (z := We), 


n=0 n=0 


where the multiplicative property w(e, *--- * e1) = (ye1)” was used. The 
requirement | = ||¥v|| = ||(z")||goo implies |z| < 1, else |z|” would grow beyond 
lasn—> ~.. 


2. The characters of €!(Z) are We (dn) = baer) ane”. 


The proof is the same as above except |z| = 1, that is, z = e! 
0<0@ <2z. 
3. For L'(S!), the characters are w(f) = fi e'”® £6) dO, (n € Z). 


9 © §! for some 


Proof Let wy € A © L'(S')* = L™(S!), so W(f) = ia h(0) f (@) dé for 
some h € L™(S!'). Recall that L'(A) does not contain a unity for convolution 
(Example 13.3(5)); nevertheless, one can be added artificially, so A exists and its 
characters act on L!(A). Again we require 


(a) 1 = |W] = |IAllz~, so |2(@)| < 1 for almost all 6; 
(b) W(f * 9) = W(f) Wg), or equivalently, 


2n Qn 20 2n 
[ neo) | F@—nacnyanae = [ n(0) f (0) 49 h(n)g(n) dn. 


This implies that h(@ + n) = h(@)h(n) a.e.; we’ve met this identity before in 
our preliminary discussion on the exponential function in Section 13.2, where we 
concluded that h(@) = h(1)? = e°, assuming / is continuous. That this can be 
taken to be the case follows from Corollary 9.22, 
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| | (iy +e) — hn) FO ¢y = | [ mor(ro- 9 = £09) dy 
< fifo -9 = foniay > 0. 


Moreover, (27) = h(0) = 1 implies that h(1) = e!” for some n € Z. 
4. For L'(R), the characters are We(f) = fo e'** f(x) dx, (€ € R). 


Proof Let y € A C L'(R)* = L©(R); so W(f) = fro) f@) dx. As before, 
|h(x)| < 1 for all x, while the condition w(f * g) = wW(f)w(@) is equivalent to 
h(x + y) = A(x)h(y) ae., so h(x) = A(1)*. To avoid h(x) growing arbitrarily 
large as x — boo, |h(1)| must be 1, and h(x) = ers, 


5. Repeating for L'(R+), A = {e~** : Rez > 0}. 
6. * For C[0, 1], A = {dy € CLO, 1]* : d.(f) = f(x), x € [0, 1]} = [0, 1]. 


Proof That 6, are functionals (with unit norm) is Example 8.6(6). In addition, 


bx( FQ) = (FM) = FG) = bx(P)dx(g), and 6(1) = 1. 


Note that for x 4 y, 6.(f) 4 46y(f) for some f € C[O, 1]. 


For the converse, let y be a character of 1 


C[O, 1]. Define ‘triangle’ functions, T,,;(x), 
as in the accompanying plot; note that these 
0 


functions overlap and sum to | everywhere, 1 4 #41 ‘ 


oti =. 0 Qn 9 Qn) 9n 

Then 1 = Wl = D>); W(t.) and at least one triangle function must give 
W(tn,i,) A 0. In fact, W(t,i) = Ofori A in—1, in, in +1, since Tj Ti, = 0. By 
taking larger values of n, and selected values of i,, the nested intervals [Ss ; inf ] 
shrink to some point x. For any function f € C[O, 1], 


intl 


US = vd =V( YY mif) > Fe), asn 0. 


i=in—1 


The map x +> 4, is thus 1-1 and onto A. Furthermore x, > x # 6,, — dx, 
since the latter means f(x,) — f(x) for all f € C[O, 1], in particular for the 
identity function f(x) := x. 


7. * The character space of the Banach algebra C[7,,..., T,] generated by com- 
muting elements, is isomorphic to a compact subset of C” (use the map w br 


(WT), ..-, WTn)). 


8. * The character space is weakly closed, i.e., W € AANDY, ~ Y > Wea. 
Consequently, for a separable Banach algebra, A is a compact metric space. 
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Israel Gelfand (1913-2009) studied functional analysis at the 

Z ® University of Moscow under Kolmogorov in 1935, specializing 
( a in commutative normed rings. During 1939-41 he studied Ba- 
wae / nach algebras, introducing his transform and proving the spec- 
\ tral radius formula, which gave much impetus to the subject; in 
> 1943, with Naimark, he proved the embedding of special com- 


= mutative *-algebras into B(H); and then in 1948 he simplified 
as the subject-matter with the introduction of the C*-condition 
/S ||2*ar|| = |lar||?. 


Fig. 14.2 Gelfand 


Proof Taking the limits of W(S + 7) = WS + WaT, WAT) = AW T, 
Wal ST) = (WnS)(WnT), and w,1 = 1, shows that yw is an algebraic morphism. 
Also |W%7T| < ||T || becomes |y7| < ||T|| in the limit n — oo, and yw is contin- 
uous. For a separable Banach algebra, the unit ball in * is compact with respect 
to the weak*-metric (Theorem 11.40), and so is its weakly closed subset A. 


The Gelfand Transform 


To see why characters may be useful, consider the algebra ¢! and its characters pz. 
A sequence such as x = (1/2, 1/4, 1/8, ...) can be encoded as a complex power 
series in terms of its characters, p,(x) = (°°) z"/2"t! = (2 — z)7!. Then the 
convolution product x x - - -* x can be evaluated using characters instead of working 
it out directly, 


I So N(N+1)---(N4+n—1 


For an example from probability theory, consider a random variable that outputs a 
natural number n = 0, 1,2,..., with probability 1/2”*!. The probability distribu- 
tion of the sum of NV such random outputs is x * -- - * x, which can be read off from 
the coefficients of p,(x); e.g. the probability of getting a total of, say 2, after N 
trials is N(N + 1)/ 2'+3_ Further, the mean of such a sum of random variables is 
given by differentiating (2 — z)~" at z = 1, that is N. The key step is to consider 
pz(x) as a function of z. Its generalization leads to: 


Definition 14.36 


The Gelfand transform of T is the map T A(X) — o(T) defined by 
Ty) := VT. 


The element T is transformed into a function on the compact space A. The alge- 
braic structure is preserved, but the transform is generally neither 1—1 nor onto. 
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Proposition 14.37 


The Gelfand transform G : T +> T is a Banach algebramorphism 
X* > C(A), 


Its kernel ker G contains the quasinilpotents and the commutators. 


For any analytic function on the spectrum of T, f « C°(o(T)), 


f(T) =fo 


Proof It is clear from 


IT) — T@)| = WT — 6T| < lv — OMIT IL 
and |T(y)| =|wT| < ||T I), forall yg € A, 


that T isa (continuous) Lipschitz and bounded function on A, with || T|| c KIT. 
For any wy ¢€ A, we have: 


TW) =w1l=1, 
AT (W) = WAT) =AYT = AT), 
S+TW =VS4T)=¥S4+0T = 64D, 
ST (h) = W(ST) = WS WT = SQW) TW) = ST)W). 


Clearly, from TW) = ¥T,T = T=06 A(T) = = 0. If Q is a quasinilpotent then 
A(Q) C a(Q) = {0}. Also, [S,T] P| = ST — TS =Osince C(A) is commutative. 
Lastly, as w(S~!) = (wS)7!, for any ye A, SEX, 


anes 1 
FDW =vf(1) =v (5 f FO@—T) a:) 


1 
oe § FON —wT)'dz (WT €o(T)) 
JTL 
= fT) = f oT). n 


We cannot expect the Gelfand transform to be very useful for general algebras 
as it loses information by representing ¥ as a subspace of the special commutative 
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ee 


algebra C(A); for example, S -1T§ = §-!TS = T. But for commutative Banach 
algebras the situation is much improved: 


Theorem 14.38 


For a commutative Banach algebra, 


im? =A(T)=o(7), |Tica) =e(T), kerG=J. 


Proof Any maximal ideal of a commutative Banach algebra is the kernel of some 
character: Given a closed ideal M, the mapping ®(T) := T + M is a Banach 
algebra morphism ¥ > ¥/M with M = ker ® (Exercise 13.10(19)). By Exercise 
13.10(18), when M is also maximal in 7, then V/M has no non-trivial ideals, and 
so is isomorphic to C (Example 14.5(4)). Hence ® : ¥ > ¥/M = Cisacharacter. 

But any non-invertible T belongs to some maximal ideal MM (Example 13.5(8)); 
so there must be some y € A such that M = ker wy, implying wT = 0. Thus T —A 
is not invertible if, and only if, there isa Ww € A, with WT —A = W(T — A) = 0, 
ie., A € A(T), and therefore A(T) = o(T). (Note that this shows the existence of 
characters in a commutative Banach algebra.) Since the two sets are the same, they 
have the same greatest extent, 


Tc = max |WT| = p(T). 
IT lle oe ee p(T) 


The quasinilpotents are in the radical: If Q is a quasinilpotent, and T € 4, then 


1/n 


p(TQ) = lim |(TQ)"|"" = lim, |T"O"||""" < p(T)p(Q) = 0, 


n—>0oOo 
so Q is in the radical. Moreover, ker G = J since 


T=0 6 A(T) ={0} S o(T) = {0}. | 


Proposition 14.39 


A Banach algebra which satisfies, for some c > 0 and all 7, 
ITI? <cllT7I, 


can be embedded in the commutative semi-simple Banach algebra C(A), 
via the Gelfand map. 
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Proof By induction on n, 
ITI?" < IT" << eT | 
from which can be concluded 
IT < lim cl?" =e p(7). 


This inequality has various strong implications: 


X is semi-simple: 0 is clearly the only quasinilpotent. 
X is commutative: For any S,T € ¥, 


IST || < cp(ST) =cp(TS) <ellTS|. 
Hence, the analytic function F(z) := e~*" Se*” is bounded, 
Wee,  IF(@)I| <clSe*?e-“7 || = cll SI]. 


By Liouville’s theorem, F must be constant, eT Sel = §) that is, ec § = See". 
Comparing the second terms of their power series expansions, 


(1+2zT + 0(z))S = SU +2T + 0(z)), 


gives TS = ST. 

The Gelfand map is an embedding: G has the trivial kernel 7, and is thus an 
algebra isomorphism onto a Cc C(A). Moreover, ||T || < c p(T) = cllT lle. so G7! 
is continuous. Oo 
Exercises 14.40 

1. InC, as well as C’, €© and C[O, 1], the only quasinilpotent is 0. 


2. Quasinilpotents are preserved by Banach algebra morphisms. 


3. A quasinilpotent upper triangular matrix must have Os on the main diagonal, so 
is nilpotent. Deduce, using the Jordan canonical form and Theorem 13.8, that 
every quasinilpotent of a finite-dimensional Banach algebra is nilpotent. 


4. (Q, R) € & x Vis quasinilpotent (or radical) when both Q and R are. 


5. The operator V : €° — &€% defined by V(ay) := (0, ao, a1/2, a2/3,...) is 
quasinilpotent. 


6. Prove directly that the Volterra operator f t> ie f,on C[O, 1], is a quasinilpo- 
tent. 
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11. 


12. 


13. 
14. 


15. 


16. 


. Aquasinilpotent for which ||(z — T)~!|] < —— forall z in aneighborhood of 0, 


[z|" 
must in fact be a nilpotent. (Hint: use |7”|| < 5 f |zI"II(c — T) ||| dz < ec.) 


. p(T QS) = 0 for any S,T € &, Q € J. (Hint: Example 14.2(5).) 
. If € Aand f € C°(o(T)), then Wf(T) = f(WT). 
10. 


S(T) and A(T) have better properties than o(T), and may yield useful infor- 
mation about it: 


(a) S(S+T)CS(S)+ S(T), SA) ={1}, SAT) =AS(T), 
(b) A(S+7T) C A(S)+ A(T), AST) © ACS)A(T). 
For C’, A = {61,...,6y } where 6;(z1,..., ZN) := Z are the dual basis. The 


same is true for the space co, A = { 4; € ons :6;(a0,41,...) = qj}. 


For B(C?) (and B(CY)), A = ©. (Hint: Consider products of : a « ay 


etc.) 


For characters of the group algebra c W(€n-1 gn) = weg) and |y¥(e,)| = I. 


> The invertible elements of a commutative ¥ correspond to the invertible 
elements of 1. 


The Gelfand transform on C’, mapping CY — C(A) = C*, is the identity 
map. The same is true for C[0, 1], soo(f) =im f for f € C[O, 1]. 


p> The Gelfand transform gathers together various classical transforms under 
one theoretical umbrella: 


(a) Generating functions: G : €!' —» C(D), maps a sequence x = (an) to a 
power series on D, the unit closed disk in C, 


Co 
(an) > aie 
n=0 
ca . 
(b) G: €1(Z) > C(S!)is similar, (0) := >) ane'"®. It follows that o (x) = 
n=—0Oo 


{x(@) : 0 < @ < 2s}, and the sequence x is invertible in é!(Z) (in the 
convolution sense) exactly when >”, ane!’ # 0 for all @. This is essentially 
Wiener’s theorem: If f € C(S ') is nowhere 0 and fe !(Z) then the Fourier 
coefficients of 1/f are also in ¢!(Z). 


(c) Fourier coefficients: L'(S'!) > C(Z) = €*(Z), 


Qn 
f(n):= | e ("9 £9) dd. 
0 
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17. 


18. 


19. 


20. 
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(d) Fourier transform: L'(R) + C(R), 
f@ = 7 e*8 f(x) dx. 
(e) Laplace transform: L'(R+) > C(C*), 
lo) 
Lf (s):= | e * f(x)dx, Res >0. 
0 

In all these cases, f¥g =f@. 
* In any Banach algebra, if ST = TS theno(S+T) C o(S)+ 0(T) and 


o(ST) C o(S)o(T). (Hint: Consider the commutant algebra { S, T }” (Exercise 
13.10(14) and Example 13.5(6)).) 


In a commutative Banach algebra, ett — eel, and De? =e!. 

The set of exponentials e* is a connected group, so et =E=G, (Proposition 
13.24). 

A Banach algebra which satisfies ||7*|| = ||T \|7 is isometrically isomorphic to 


a subalgebra of C(A): the condition is equivalent to ||7|| = o(T) = 7 |). 


Conversely to the proposition, a Banach algebra that can be embedded in some 
C(K) (K compact) satisfies TI? < cll? I). 


Remarks 14.41 


1. 


Given acompact set K C C, is there an element T with spectrum o (7) = K? Of 
course, this is false in the Banach algebra C, where all spectra consist of single 
points, and in B(C), where the spectra are finite sets of points. But in 2° there 
are elements with any given compact set K for spectrum (Example 14.2(3)). 


. The distinction between op», o- and o; is not purely of mathematical interest. 


In quantum mechanics, a solution of Schrédinger’s time-independent equation 
Hw = Ew gives energy-eigenvalues with eigenfunctions that are “localized” 
(since yw € L?(R3)), whereas the continuous spectrum corresponds to “free” 
states. 


. Among the operators in Section 14.2, one can find examples without point, con- 


tinuous or residual spectra (and any combination thereof, except all empty). Note 
also that the spectra of these examples are misleadingly not hard to compute in 
contrast to generic operators. 


. There are various definitions of spectra of T that are subsets of o (T). The singular 


spectrums the set of A such that T —A is a topological divisor of zero. The essential 
spectrum consists of 4 such that T — 4 is not Fredholm. 
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5. Recalling p,(T) := lim sup, 7x ||", defined forT € B(X) andx € X (Remark 
13.32(4)), suppose a closed subset of the spectrum of T is isolated from the rest 
of the spectrum by a disk, 0; C B,(a). If ox(T — a) <r then x € X since 


1 
Pixs af (z—T)'xdz= Saar —a)"x =x. 
27 So, = 


6. The Gelfand transform can be extended to S(V) — S(T) retaining the same 
(non-multiplicative) properties. 


Chapter 15 
C*-Algebras 


B(#Z) is a special Banach algebra when H is a Hilbert space because there is an 
adjoint operation that pairs up operators together. Its properties can be generalized 
to Banach algebras as follows. 


Definition 15.1 


A (unital) C*-algebra is a unital Banach algebra with an involution map * : 
X — & having the properties: 


ft=7, (FHS aT. S*, OF =]AT*, 
(Se Ses (etal ele: 


A *-morphismis defined as a Banach algebra morphism ® which also preserves 
the involution ®(T*) = (®T)*. 


Easy Consequences 


1. O* = 0, 1* = 1, z* = Z (by expanding (0 + 1)*, (1*1)*, and (z1)*). 


2. IT || = |T*|| Gince 7? = |T*T || < |T*INTI, and so ITI] < ||T*I < 
||7** ||); the involution map is thus continuous and bijective. But it is neither 
linear ((i7)* = —iT*), nor differentiable (since (T + H)* = T* + H*). 


2. FT |S N27. 
4. (T*)~! = (T~!)* when T is invertible. 
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5. o(T*) = p(T), o(T*) = o(T)* Gince (T? — A) = (TF = a! 


One might expect that ||7*|| = ||7|| be taken as an axiom, and indeed Banach 
algebras with involutions satisfying this weaker axiom are studied and called Banach 
x-algebras. C*-algebras resemble C more closely, except for commutativity: the 
chosen axiom, which is the analogue of the familiar one zz = Iz|, is much stronger 
and can only be satisfied by a unique norm, if at all (Example 15.10(6)). 


Examples 15.2 


1. The simplest example is C with conjugacy. C% has an involution 


Cex gE) = Bivtens an), 


This example extends to €™. 
2. C[O, 1] with conjugacy, f(z) := f(z). 


3. B(H) with the adjoint operator, where H is a Hilbert space (Proposition 10.20). 
We will see later (Gelfand-Naimark’s Theorem 15.48) that every C*-algebra can 
be embedded into B(#) for some Hilbert space H. 


4. B(A) contains the closed «-subalgebra 
C@K:={a+T:ae€C, T € B(A) compact } 
5. If X and Y are C*-algebras then so is ¥ x Y with (S, T)* := (S*, T*) (Examples 
13.3(7)). 


6. > €'(Z) has an involution (an,)* := (Gn), that satisfies ||x*|| = ||x|| but not 
\|x* x x|| = ||x||?. However, it can be given a new norm, ||x|| := || Z|] where 
Lyy := x *y for y € ¢?,and L : x + Ly embeds ¢!(Z) as a commutative 
C*-subalgebra of B(€7). Similarly for L'(R). 


7. <The group algebraC® has an involution making it a *-algebra, but not a C*- 
algebra, 
* 
a = (SX wes) = Daeg. 
gEG geG 
(However, it is a C*-algebra when represented by matrices and their norms.) 
Exercises 15.3 


1. Polarization identity: If w is a primitive root of unity, w” = 1, then 


' To avoid ambiguity with the closure A of aset A C C, A* will denote the set of conjugate numbers 
{z:zEA}. 
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es . 
T*S=— wi (S+u'T)*(S +u'T), 
n 


i=1 


ie 
S*S+T*T =—)(S+u'T)*(S +u!'T). 
n 


i=1 
2. For any real polynomial (or power series) in T, p(T)* = p(T*). 


3. If T is a nilpotent, a quasinilpotent, a divisor of zero, or a topological divisor of 
zero, then so is T*, respectively. If T*T is a nilpotent, then so is T7*; but find 
an example in B(€”) where T*T is invertible yet T7* isn’t. 


4. If T*T and TT™ are both invertible then so is T, 
T= (7*T)!T* =T*(TT*). 


5. If the condition number of T is c, that of T*T is c* (Exercise 8.14(5)). 


6. The inner-automorphism 7 +> S~!TS is a *-automorphism exactly when SS* 
belongs to the center Vv’ (in which case S*S = SS*). 


7. * A x-isomorphism B(H,) — B(Hp2) is of the type T hb LTL~! where 
L= AU, # Oreal, and U : H, — Hy is a Hilbert-space isomorphism. 


8. A x-ideal is an ideal that is closed under involution. Examples include the kernel 
of any *-morphism and the Jacobson radical. 


9. If A C & is closed under adjoints (A* = A), then so is its commutant A’ (which 
is thus a C*-subalgebra) (Exercise 13.10(14)). 


10. * Suppose ¥ has no unity but otherwise satisfies all the axioms of a C*-algebra. 
Show that the embedding L : ¥ — B(#) (Theorem 13.8) is still isometric, 
and that LY © [[/]] with the adjoint operation (Lg + A)* := La* + is a unital 
C*-algebra. 


15.1 Normal Elements 


It is a well-known fact in Linear Algebra that real symmetric matrices are 
diagonalizable with real eigenvalues and orthogonal eigenvectors. This makes 
them particularly useful and simple to work with, e.g. if T = PDP! then 
f(T) = Pf(D)P™! can easily be calculated when D is diagonal. However, these 
matrices do not exhaust the set of diagonalizable matrices via orthogonal eigenvec- 


11 — 
11 yp may have complex 
eigenvalues. As we shall see later, diagonalization is closely related to the commu- 
tativity of T with T*. 


tors: for example, diagonalizable matrices, such as 
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Definition 15.4 


Anelement T is called normal when 7*T = TT*, unitary when T* = T~!, 
and self-adjoint when T* = T. 


Examples 15.5 


1. 
2. 


It is clear that self-adjoint and unitary elements are normal. 


Any z € C is normal; it is self-adjoint only when z € R; it is unitary only when 
|z| = 1. 


. A diagonal matrix is normal; it is self-adjoint when it is real, and unitary when 


each diagonal element is of unit length |a;;| = 1. 

More generally, diagonalizable matrices, of the type T = UDU™* where U is 
unitary and D is diagonal, are normal: T*T = U D*U*U DU* = U D* DU* = 
UDD*U* =TT™*. 


. The operator T f (x) := fo k(x, y) f(y) dy on L?[0, 1] is normal when (Example 


8.6(4c)) 
1 


1 
[Fo2k6.99 dy = f kx. 9G.) ds a.e.(x, y) 


0 0 


. When T is normal, a polynomial in T and T* looks like 


N M 
p(T, ar = ae ae haan Tepe. 


The set of such polynomials C[T, T*] is a commutative +-subalgebra. The char- 
acter space of its closure C[7, T*] is denoted by Ar. 


. A unitary matrix is a square matrix whose column vectors are orthonormal. A 


self-adjoint matrix is a square matrix [a;;] such that aj; = aj;, e.g. 7 a 
Proof If u; denotes the ith column of U, then U*U = I implies 


(ij) Sag O, 


. The unitary operators of B(H) are the Hilbert-space automorphisms of H 


(Proposition 10.23). 


. » If T is normal, then so are T*, T +z, zT, T”, and T~! when it exists. But the 


addition and product of normal elements need not be normal, e.g. ; ;) and 
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Proof for T~'. Taking the inverse of TT* = T*T together with (T~')* = 
(T*)—! gives the normalcy of T~!. 


9. p» If 7, are normal and 7,, — T, then T is also normal, 1.e., the set of normal 
elements is closed (as are the sets of self-adjoint and unitary elements). 


Proof The limit as n + oo of T*T, = T, T* is T*T = TT™* since the adjoint is 
continuous. Similarly take the limit of 7* = T;, or Tx = T,! to prove the other 
statements. 


10. » If S, T are self-adjoint, then so are $+ T, AT (A € R), p(T) for any real 
polynomial p, and 7~! if it exists. But ST is self-adjoint iff ST = TS. 


11. » If 7 is self-adjoint, then e'! is unitary; in fact, letting U‘ := eT +eER, gives 
a one-parameter group of unitary elements (Exercises 13.25(9) for definition). 


The analogy of self-adjoint elements with real numbers and unitary elements with 
unit complex numbers raises the issue of which propositions about complex numbers 
generalize to C*-algebras. 


Proposition 15.6 


Every element 7 can be written uniquely as A + iB with A and B self- 
adjoint, called the real and imaginary parts of 7 , respectively. 


The real and imaginary parts of T are denoted Re T and Im T. 


Proof Simply check that A := (T + T*)/2 and B := (T — T*)/2i are self-adjoint. 
The sum A +B is obviously 7. Uniqueness follows from the fact thatif A+iB = 0 
for A, B self-adjoint then A = 0 = B since 

A= A* = (-iB)* =iB=—A. Oo 


Proposition 15.7 


The set of unitary elements //() is a closed subgroup of G(), 
U, Vunitary > UV, U fin unitary. 
Unitary elements have unit norm, ||U|| = 1. 
Proof \f Up are unitary and U,, — T, then by continuity of the involution, Uy > T*. 


Also, the equations U*U, = 1 = U,U;x become T*T = 1 = TT™* in the limit, that 
is f= 7%, 
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For any U, V € U(X), UV and U* (= U~!) are also unitary, 
CV Sr =y =O: 
U** _ U — me = —_ (or. 
Finally, ||U |? = |U*U|] = ||| =1. Oo 


The next theorem starts to unravel the close connection between normal elements 
and their spectra. 


Proposition 15.8 


For T normal, p(T) = ||T ||, and S(T) is the closed convex hull of o(T). 


Proof (i) For any normal element T, ||77|| = || T ||* since 
2 
74 = TAT? = TPT) EFT) | = E2777 = PAW 


But 7? itself is normal, so the doubling game can be repeated to get, by induction, 
77 | = ITI?" and 


% - k —k 
p(T) = lim |7"\'/" = lim 7? |)? = ITI. 
noo k->0o 


(11) As S(T) is a closed convex set that contains o(T) (Proposition 14.34), it must 
also contain the convex hull of the latter. Notice that, by (i), a(T) reaches to the 
boundary of S(T). 


Conversely, suppose is not in the closed con- 
vex hull of o(7). There must be a straight line r 
through not intersecting o(T) (why? Hint: con- 
sider rays emanating from ; they intersect the 
closed convex hull over an interval of angles). So 
the spectrum can be enclosed by a ball B,[z] that 
does not meet the line (Exercise 6.22(7)). B,(z] 


For any dG € S, 
leT —z|=|9@7 -z)| < (IT -zll=epT—-z)<r<|A-zl 


so \ & $T. It follows that S(T) has the same points as the closed convex hull of 
o(T). oO 


15.1 Normal Elements 351 


Proposition 15.9 Fuglede’s theorem 
If T is normal and ST = TS then ST* = T*S. 


Proof From f(T)S = Sf(T) (Example 14.23(1b)), we have e~*7 Se*? = S. Writ- 
ing ZT = A+B and noting that zT is normal, so AB = BA (Example 15.10(1a)), 
we find 


F(z) := el geet” — oe AtTiB gy A—iB 
— gtiB yi gp2T 2-28 
— e2iBgo—2iB 
“| F@I| < |S] by Example 15.5(11). 

As F is a bounded analytic function of z, by Liouville’s theorem it is constant, 
F(z) = F(0) = S,ie., & § = See!” Comparing the second term of their power 
series gives T*S = ST*. Oo 
Examples 15.10 
1. If T =A+iB, where A, B are self-adjoint, then T* = A —iB and 


T*T = (A? + B) +i[A, Bl, 
TT* =(A° + B*)— 7A, BI, 


(note that i[A, B] = sIT*, T] is self-adjoint). So, 
(a) T is normal if, and only if, AB = BA; 
(b) T is unitary if, and only if, AB = BA and A* + B? = 1; 
(c) T is self-adjoint if, and only if, B = 0. 
2. & is commutative if, and only if, every element is normal. 
Proof Tf every element is normal, then for any T = A+iB, AB = BA, e., any 
two self-adjoint elements commute. But then 7S = (A +iB)(C +iD) = ST. 
The converse is obvious. 
3. (a) For T normal, ||7"|| = |||", since ||] = p(T) < 7" '/" < IIT. 
(b) For any 7, |||?” = ||(T*T)"|| and || || = Vp(T*T). 
4. p> Ois the only normal quasinilpotent and the only radical element, that is, every 


C*-algebra is semi-simple. More generally, if T is normal with o(T) = { z}, then 
T =z. 


Proof If Q is anormal quasinilpotent, then || Q|| = p(Q) = 0, so Q = 0. If P is 
a radical element, then || P||? = || P*P|| = p(P*P) = 0. 
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Every C*-algebra has a unique norm satisfying ||7*T || = || Tr. 


Proof Suppose there is a second C*-norm. Then the norms must agree on normal 
elements, ||7'|| = p(7’) = ||T'|], and so must agree on all elements 


1 1 
ITI = 7*7 2 = WT*T 2 = WTI. 


Exercises 15.11 


1. 
2: 


12. 


; ; 1 
. Triangular matrices, such as ( 


What are the normal, self-adjoint and unitary elements of 2° and C[O, 1]? 
Generalizing from diagonal matrices, any multiplier operator on 07, (ay) t 
(bndn) is normal. It is self-adjoint when b, € R, and unitary when |b,| = 1, 
for all n. 

Find similar conditions for a multiplier operator on L?(R), Tf := gf, 
(g € CR). 


0 3) are not normal (unless diagonal). A real 


diagonalizable matrix, such as ( : ), need not be self-adjoint. 


—10 


. For any T, aT + GT* is normal when |a| = |]. 
. A *-morphism preserves normal, self-adjoint, and unitary elements. 


. If P; are normal idempotents with P; Pj = dij P; as wellas Pj+---+P, = 1, then 


z1P, +--++ Zn Py is normal (unitary when |z;| = 1) and for any polynomial p, 
P(ziPi + +++ + Zn Pn) = p(zi)Pi +--+ + PGn)Pr- 


Unitary elements 


. The shift-operators on €7(Z) are unitary, with 0(R) = o(L) = S ' (but on £7, 


they are not even normal). 


. Translations Ty, f(x) := f(x — a) and stretches Sy f(x) := a? f (ax) (a > 0), 


acting on L?(R), are unitary. 


. If U is unitary then for any T, ||UT|| = ||T|| = ||TU I. 
. If U € & is unitary and V := XU (A ¥ 0), then T + V—'!TV is an inner 


*-automorphism of ¥. 


. If T is an invertible normal element, then T*T~! is unitary. 


For example, the Cayley transformation U := (i — T*)(i + T)~! maps T toa 
unitary element if i + T is invertible. Compare with the Mébius transformation 
ZH (—Z)/(@ +z), which takes R to the unit circle (Ob 1, 1h i, coh —1). 


U(X) need not be a normal subgroup of G(4’); when does T-!UT CU hold? 


15. 


13. 


14. 
15. 
16. 


17. 
18. 
19. 


20. 


21. 
22. 
23. 


24. 
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Self-Adjoint elements 


The operator Tf (x) := [ k(x, y) f(y) dy on L?(R) (k € L?(R?)) is self-adjoint 
when k(y, x) = k(x, y) a.e. (Hint: Examples 10.24(3), 8.6(4)). 


For any T € 4%, the elements T + T*, T*T and TT™ are self-adjoint. 
The real and imaginary parts of T satisfy || Re T|| < ||T|l, || Im 7] < ||T|]. 


Find the real and imaginary parts of ST when S and T are self-adjoint. 


Spectra of Normal Elements 
For S$, T normal, p(S + T) < p(S) + p(T), and p(ST) < p(S)p(T). 
When T is normal, then || |e’ is a spectral value for some 0. 


Let Q # 0 be a quasinilpotent, then | + Q is not normal. More generally, if T 
is normal and TQ = QT, then T + Q is not normal. 


If A*B = 0 = AB*, then ||A+B]| = max((All, Bll). (Hint: Show 
||A + Bll?” = ||(A*A)” + (B*B)" |.) 


If S and T are commuting normal elements, then ST is also normal. 
If T*T is an idempotent then so is TT™*. 


A commutative C*-algebra is isometrically embedded in some C(K ) (Exercise 
14.40(19)). 


Let ® : XY — Y be a *-morphism between C*-algebras with 4 commutative. 
Then ®(7) is normal in Y for any T € #, and ® is continuous with ||®|| < 1 
(Hint: o(®(T)) € o(T)). 


15.2 Normal Operators in B(#) 


Let us see what properties normal elements have for the most important C*-algebra, 
B(#) when H is a Hilbert space. 


Proposition 15.12 


For a normal operator T € B(A), 
(i) |T*x|| = ||T xl, 
(ii) ker T* = ker T = ker T* = (imT)+, 


(iii) im 7 isdensein H <= T is 1-1, 
(iv) T is invertible in B(H) = 


Jes 0, vx eH, “cllx|| = ||7x\[. 
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Proof (i) follows from 
F*x|]? = (T*x, T*x) = (x, TT*x) = (x, T*Tx) = (Tx, Tx) = ||T xl. 


(ii) ker T = ker T* is due to T*x =0 © ||T*x]| = ||Tx|| =0 @& Tx =O, using 
(i). ker T* = ker T, i.e., T?x =0 & Tx = 0 follows from 


Tx? = (x, T*Tx) < [xi T*7 xl] = ll 7?xI 


From Proposition 10.21, Gm T)+ = ker T* = kerT. 
(iii) By (ii), T is 1-1 if, and only ifim T = (ker T)+ = 0+ = H. 


(iv) If T has acontinuous inverse, then |x || = || 7~!Tx|| < ||7~!||||Tx|]. Conversely, 
if the given inequality is true for all x € H, then T is 1-1 and the image of T is closed 
(Examples 8.13(3)). By (iii), im T = H and T is bijective. Its inverse is continuous: 


e|T~!x|| < |TT7!x|| = [xl], Vx € H. o 
Proposition 15.13 


For a normal operator T € B(A), 
(i) Tv = \v & T*v = Xv, and eigenvectors of distinct eigenvalues of 
T are orthogonal, 
(ii) o(T) contains no residual spectrum, o,(T) = 2, 
(iii) isolated points of o(7) are eigenvalues. 


Proof (i) is a direct application of ker(T — A) = ker(T* — ), as T — d is normal. 
Note that the eigenvectors of T and T* are identical. For eigenvalues A and js with 
corresponding eigenvectors x and y, we have 


Ae) = iy Ta ST ye) = 2) = a), 


implying either \ = yz or (y, x) = 0. 

(11) Let A € o (T); either T — A is not 1-1, in which case 4 is an eigenvalue (point spec- 
trum); or itis 1-1, in which case its image is dense in H by the previous proposition, 
and A forms part of the continuous spectrum. 


(ii) If { X} is an isolated point of o(7), form the projection 


1 
P= aq fe- ta 
QTi 
{A} 


15.2 Normal Operators in B(H) 355 


onto a space X\ A 0 (Example 14.28(1)). Then o(T|x,) = { A}, and since T|x, is 
normal as well, ||T |x, — All = p(T |x, — A) = 0, Le., Tx = Ax for any x € X). O 


Examples 15.14 


1. 


> A projection P € B(H)isnormal <> self-adjoint < orthogonal } || P|| = 0 
or 1. 


Proof If P is orthogonal (Theorem 10.12), then (x — Px) L Px, so 
(x, Px) = ((I — P)x + Px, Px) =||Px||? eR 


hence (x, Px) = (Px,x) = (x, P*x) for all x € H, and P = P* (Example 
10.7(3)). 

If ||P] = 1, let x € (ker P)+, so that x L x — Px. Then ||Px||? = ||x||? + 
|| Px — x|l, yet || Px|| < ||x|], sox = Px € im P and ker P 1 im P. The other 
implications should be obvious. 


. All spectral values of a normal operator are approximate eigenvalues (either eigen- 


values or part of the continuous spectrum) and there are no proper generalized 
eigenvectors (Section 14.3). Note that a normal operator need not have any eigen- 
values, e.g. Tf (x) := xf (x) on i710, 11. 


Exercises 15.15 


1. » Conversely to the proposition, an operator which satisfies || 7*x|| = ||Tx|| for 


all x is normal. 


. When T is anormal operator, ker T and im T are both T- and T*-invariant. 


. Suppose 7,x —> Tx for all x € H where T,, are normal operators in B(#7). 


(T is an operator by Corollary 11.35.) Then T is normal if, and only if, Vx, 
T,xx > T*x. 


. The eigenvalues of self-adjoint operators are real, and those of unitary operators 


satisfy |A| = 1. 


. A normal operator on a separable Hilbert space can have at most a countable 


number of distinct eigenvalues. 


. Suppose H has an orthonormal basis of eigenvectors of an operator T € B(HZ). 


Show that T is normal. (Hint: show ||T*x|| = ||Tx||.) 


. If Tx = Ax, T*y = py, and p # ) then (y, x) = 0 (T not necessarily normal). 


. An Ergodic Theorem: Consider the Ceséro sum 


nn 0 ey ere eae 


If p(T) < 1 then 7, ~ (J — T)~'/n —> Oasn — o. Now let T be a normal 
operator with p(T) = 1. 
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(a) For Tx = x (i.e., x € ker(T — J)), we get T,x = x; 
(b) Forx = y—Ty € im(T — /) we get T,x = (y —T" y)/n > 0; 
(c) For any x € H, T,x — xo € ker(T — J), the closest fixed point of T. 


If T is not normal then 7, may diverge, e.g. T = ( a gives T, = 


1(n—1)a/2 
0 1 


The Numerical Range 


To help us further with analyzing the spectra of normal operators, we require an 
additional tool. A given vector x need not, of course, be an eigenvector of an operator 
T, but we can ask for that value of \ which minimizes ||7x — Ax||. According to 
Theorem 10.12 there is indeed a unique vector Ax € [[x]] which is closest to Tx, 
and it satisfies (Tx — Ax) L x, or equivalently, \ = (x, Tx) /||x||*. This number is 
sometimes called the mean value of T at x, or the Rayleigh coefficient, and denoted 
by (T),. We are thus led to the following definition: 


Definition 15.16 


The numerical range of an operator T € B(H) Is the set 


W(T) := { (x, Tx) : |lxll = 1}. 


Examples 15.17 


1. 


(1), = 1, (T+ 8)_ = (T)¢ + (Spx (AT) » = MT) x5 (T") 9 = (1) x 


These are easily verified, e.g. 


(x, T*x) = (Tx, x) = (x, Tx) 


2. » For operators on a complex Hilbert space, 


(a) WT) = {1}, and W(z) = {z} (ze ©), 

(b) W(T +z) = W(T) + z (translations), and W(AT) = AW(T), 
(c) WS+T) C W(S) + W(T), 

(d) W(T*) = W(T)*. 


3. W(T) includes the eigenvalues of T and is bounded by ||7'|]. 


Proof Wf Tx = Ax for x a unit vector, then (T), = (x, Tx) = X. Also, for unit 
x, (x, Tx)| < ||Tx|| < ||T]| by the Cauchy-Schwarz inequality. 
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4. Although the quadratic form x > (x, Tx) is unique to T,1.e., (x, Tx) = (x, Sx) 
for all x if, and only if, T = S (Example 10.7(3)), the numerical range W(T) 
does not identify T in general, e.g. W(U~!'TU) = W(T) when U is unitary. 

5. Fora fixed unit x € H, one can define two semi-inner-products on B(H), 

(a) (S,T) := (Sx,Tx) = (S*T), (with associated semi-norm ||T ||, := 
|| 7x ||), and 


(b) the covariance semi-inner-product 


Cov(S, T) := (S—(S)x,  T — (T)x) = (ST), — (S)x(T) x, 


with the associated semi-norm called the standard deviation 
2 
of = Cov(T, T) = ||Tx|* — |(T),|". 

(c) The uncertainty principle states that osar > |Cov(S, T)| (essentially the 
Cauchy-Schwarz inequality (Exercise 10.10(17))). The normalized inner 
product Cov(S,7T)/osor is called the correlation; T and S are called 
independent when they are orthogonal, Cov(S, 7) = 0, so that (S$, 7) = 
(3) (T) x. 


These definitions are usually applied to L7(A), where x corresponds to a function 
p € L*(A), with | p(s)|? interpreted as a probability distribution, and the operators 
are multiplications by functions Tp := fp, that is, 


the mean (f), = f, f(s)ip(s)? ds, the rms II fllp = / J, IF PIP? 
Cov(f, 9) = Jaf — (FG — (9) IPP. 


We can now elucidate the connection between the numerical range and the spec- 
trum of an operator, hinted at in the examples above. 


Proposition 15.18 (Hausdorff-Toeplitz) 


W(T) is a convex compact subset of C, such that 


o(T) SC W(T) C S(T). 


Proof Recall the state space S(4) from Definition 14.33, where we now take the 
case ¥ = B(H). The inclusion W(T) C S(T) is obvious: for any unit vector x, the 
functional @(T) := (x, Tx) is linear in T, maps / to 1, and |@(T)| = |(x, Tx)| < 
|Z ||, so ||@|] = 1 and @ € S. As S(T) is compact (Proposition 14.34), so must be 
its closed subset W(7). 
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The main part of the proof is to show the other inclusion o(T) C W(T): for 
|x|} =1,A €C, 


a:= d(A, W(T)) < |(x, Tx) — Al = I(x, TF — A)x)| < WC — Adz, 


so for any x € H, 
allx|] < I] — A)xl. 


When \ ¢ W(T), a is strictly positive, and the inequality shows that T — X is 1- 
1 with a closed image (Example 8.13(3)). Moreover, since W(T*) = W(T)* and 
d(r, W(T)*) = d(,, W(T)), 


allx|| < \(7* — Axl. 


This implies that (T — A)* is 1-1, hence T — X is onto (Proposition 10.21). Thus 
T — X has an inverse, which is continuous (Proposition 8.12), 


all(T — A)~!xI < |W — A(T — AT XI = II, 


and A ¢ o(T). 

W(T) is convex: Given A, 4 in W(T) (A 4 2), let x, y be unit vectors such that 
(x, Tx) = , (y, Ty) = p. Any vector v := ae!?'x + Bei y (a, B, o1, 62 € R) 
has norm 


lull? = a? + 2a8 Ree! 2-9) (x, y) + G = 14 sin20 Re(e? (x, y)), 
fora =cos6, 3 = sin, ¢:= ¢2 — ¢. Then (v, Tv) works out to 


(ae! x + Bei®y, ad @lTx + Bel ® Ty) = a+ aB(e!? (x, Ty) + e Ply, Tx))+ Fu 
= Acos* 6 + sin20(w cos @ + zsind) + jsin? 6 
_ A+ 


\— 
5 “S* cos 20+ (wcos ¢ + zsin d) sin 20 


where w := 5((x, Ty) +(y, Tx)), 2 = $((x, Ty)—(y, Tx)). But wg = wos d+ 
z sin @ traces out an ellipse as @ varies. By choosing the correct value of @, wg can 
be made to point in any direction in the complex plane, including that of \ — jz. With 
this choice, (v, Tv)/||v \| gives a line segment as 0 varies, a line that contains \ and 
pL (at 6 = 0, 7/2). Thus W(T), and its closure W(T), are convex sets. oO 


As an immediate corollary, this proposition allows us to identify the self-adjoint 
operators among the normal ones from their spectrum: 
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Proposition 15.19 


For a normal operator 7, 
(i) WT) = S(T), 
(ii) T is self-adjoint <= W(7) is real. 


Proof (i) By the previous theorem, W(T) C S(T), so the reverse inclusion remains 
to be shown. The closure of the numerical range W(T) is a convex set contain- 
ing o(T), so it contains its closed convex hull, which is S(T) when T is normal 
(Proposition 15.8). 


(ii) When T € B(A) is self-adjoint, (x, Tx) = (Tx,x) = (x, Tx) for all x € A, 
which implies W(T) C R. Conversely, if (x, Tx) € R for all vectors x, then 


(Tx, x) = (x, Tx) = (T*x, x) 


which can only hold when T* = T (Example 10.7(3)). Note that this implies that T 
is self-adjoint = o(T) C R, since W(T) would be a line interval. oO 


Exercises 15.20 
1 WW)={2} } T=z. 
2. Show that, for the shift operators on £2, W(L) = B,[0] = W(R). 


3. Let T be a square matrix < ) with respect to an orthonormal basis, where 
A, D are square sub-matrices. 
(a) W(A)U W(D) C W(T). 
(b) If B= C = 0, then W(T) is the closed convex hull of W(A) U W(D). 


4. Write a program that plots W(T) for 2 x 2 matrices, and test it on random 
matrices. Verify, and then prove, that W(T) for 


(a) T:= G ) is the line joining a to b; 
(b) T := (3 1) is the closed disk B: [a] (although its spectrum is {a }); 
2 


(c) *T = ( 7) is generically an ellipse with its interior. 
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5. Let T be a square matrix with positive coefficients. If x = (a),...,ay) € CN 
and x4 := (|ai|,..., lan|), then 


I(x, Tx)| < (va, Tx4) 


so that the largest extent of W(T) (and W(7T%)) is a positive real number. 


6. The classical proofs of some of the statements above do not use the convexity 
properties of the numerical range. For a self-adjoint operator T, 


(a) o(T) is real. Prove this by letting \ := a+ i with 6 ¥ 0, and showing 
= Adal? = CT — ada? + xl? > (BP? all 


(b) W(T) is the smallest interval containing o(7). Show this by taking o(T) C 
[a, b], letting c := (a + b)/2, and proving that for any unit vector x, 


(x, Tx) —c| = |(x, (T —c)x)| <b-c=c-—a. 


7. For any T € BCA), Ao), W(T*T) = [a, b], where a > 0 and b = \|T |I?. 
8. If \ ¢ W(T), then ||(A — T)7!|| < 1/dQ, W(T)). 


9. A coercive operator T € B(A) satisfies |(x, Tx)| > c > 0 for all unit x € H. 
Show that it has a continuous inverse. An elliptic operator is one which satisfies 
(x, Tx) > c > 0, a special case of a coercive self-adjoint operator. 


10. Let 6: B(H) — C be defined by T + (x, Ty) for some fixed unit x, y € H; 
show thatdeS & x=y. 

11. (a) Cov(/, T) = 0, Cov(S, T + A) = Cov(S, T), or+\ = OT; 
(b) For every \, or < ||(T — A)x||, soor < 5 diam o(T) for T normal; 
(c) or =0 } x is an eigenvector of T, with eigenvalue (T) ,. 


(d) If S,T are self-adjoint operators, let A := +S, T] and h := (A),/2 = 
Cov(S, T), then 
osorT Sh. 


15.3 The Spectral Theorem for Compact Normal Operators 


As seen before, multiplier operators such as diagonal matrices are normal. In fact, 
all normal operators are of this type; we show this first in the simple case of compact 
normal operators. 
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Theorem 15.21 Spectral Theorem for Compact Normal Operators 


If T is a compact normal operator on a Hilbert space, then 


oo) 
Tx = > Dak Gan 2 )\Gae 
n=0 


where e,, are the eigenvectors of 7 with corresponding non-zero eigenval- 
ues \,,- 


The statement is written supposing an infinite number of eigenvectors; otherwise 
the sum is finite. 


Proof Let T be a compact normal operator. We show that H has an orthonormal 
basis of eigenvectors. 


(a) The fact that T is compact implies that the non-zero part of its spectrum 
consists of a countable set of eigenvalues, and each generalized eigenspace 
X) := ker(T — Aya is finite-dimensional (Theorem 14.18). 

(b) The fact that T — A is normal implies, first, that X¥, = ker(T — A) consists 


of eigenvectors, and second, that X are orthogonal to each other (Proposition 
15.12,13). 


Note that the eigenvalues decrease to 0 (unless there are a finite number of them). 
This is part of Theorem 14.18, but its proof in the present context is much simpler: 
As T is compact, for any infinite set of orthonormal eigenvectors e,, Ten (= An€n) 
has a Cauchy subsequence, so 


2 2 2 2 
IAnl~ + Amo = WAnén — Amem(l* = [| en — Tem||" > 0, asn,m — oo 


implying both A, — 0 and that each eigenspace ker(T — ) is finite-dimensional. 

Thus a countable number of orthonormal eigenvectors e, (a finite number from 
each X)) account for all the non-zero eigenvalues, and form an orthonormal basis 
for the closed space M := [[e1, é2, .... ]] generated by them. M+ is T-invariant since 
x € M* implies that for all n, (en, x) = 0, and as T*e, = or 


(én, Tx) = (Tens &) = An (en, x) = 0. 


Thus T can be restricted to M+, when it remains compact (Exercise 6.9(5)) and 
normal, yet without non-zero eigenvalues, because those are all accounted for by 
the eigenvectors in M. Its spectrum must therefore be 0, implying T|,y1 = 0, i.e., 
M+ = ker T. Unless M+ = 0, there is an orthonormal basis of eigenvectors €, for 
it, and collectively with e,, form a basis for H = M @ M - 
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x=) (en, xen + > (eas X)€0- 


n 


Finally, since T is linear and continuous, and Teg = 0, we find that 


Tx = (> (en, x)en) = > (én, x)T ey = > (€n, X)An€n- = 


n n n 
Corollary 15.22 Spectral Theorem in Finite Dimensions 


A normal complex matrix is diagonalizable. 


There is a remarkable generalization of this diagonalization to any compact oper- 
ator between Hilbert spaces, including rectangular matrices: 


Theorem 15.23 Singular Value Decomposition (SVD) 


If T : X — Y isa compact operator between Hilbert spaces, then there 
are isometry operators U : Y > Y and V: X > X suchthat T = UDV* 
with D diagonal. 


Proof T*T and TT* are compact self-adjoint operators, on X and Y respectively. 
They share the same non-zero eigenvalues (Examples 14.10(5)), which are strictly 
positive, since if T*Tv = Xv, ||v|| = 1, then 


A= (v, T*Tv) = ||Tv|? > 0. 


By the spectral theorem there is an orthonormal set of eigenvectors v, € X of T*T 
with eigenvalues A, = oo > 0. It turns out that the vectors Tv, € Y are also 
orthogonal, 

(Tum, T Un) = (Um, T*T Un) = Ta Onea: 


$0 Un = Tv,/on form an orthonormal set in Y. Note that, by the above, 
Tn = Onn, yar = Onvn- 


The positive numbers o,, are called the singular values of T and vy, un are called its 
singular vectors (uy, are also called the principal components of T). In fact, v, form 
an orthonormal basis for (ker T*7)+ = (ker T)+ = im T*, and similarly uy, is an 
orthonormal basis for im T (Exercise 10.26(8) and Proposition 10.21). 

It follows that for any x € X andye Y, 


x= Pxt >) (vq, x)Up, Te Yn Gingd tn; P*y= > Onlin: Y)Un 
n n n 
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where P € B(X) is the orthogonal projection onto ker T. Indeed a stronger statement 


is true: 
* 
T= > OnUnUp_ 
n 


That is, the convergence is in norm, not just pointwise, the reason being 


N oo) 
|r — >. onunvt)x||? = | > On (Un, X)Un||~ 
n=1 


n=N+1 
ioe) 


2. 2 
= >, alts %)| 


n=N+1 


< (max 0?) ||x||? 
n>N 


and maxy;s 7 0, > Oas N > oo since oy, > Oasn > ov. 

Let U be that operator representing a change of basis in im 7 from uy, to some 
arbitrary basis (leaving the perpendicular space ker T* invariant), V a similar change 
of basis in im 7* from v,. Then the ‘matrix’ of T with respect to vy, and uy is 
D := U*TV; as Tv, := Oyun and Tx := 0 for x € ker T, D is diagonal. oO 


Examples 15.24 


1. The spectral theorem is often stated as: If a compact normal operator has “matrix” 
T with respect to a given orthonormal basis é,, then T = UDU™~!, where D is 
diagonal and U is the unitary change-of-basis operator that maps (€,,) to (e,), the 
orthonormal basis of eigenvectors of T. 


2. The converse of the spectral theorem is true, i.e., defining the operator 


Co 
x= yn (€n, X)€n 


n=0 


in terms of an orthonormal basis, with ,, — 0, gives a compact normal oper- 

ator — compact because it is the limit of finite-rank operators, normal because 
2 2 2 2 

Tx" = D1, nl ien, x) [> = T* x". 


3. Given a compact normal operator in B( #7), and any function f € C(a(T)), with 
f (O) = 0, one can define the compact operator f(T) by the formula 


f(T)x = Do FOnMen, xen- 


n=0 


For example, 
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(a) /T is compact when T is a self-adjoint compact operator with positive 
eigenvalues, 


(b) for any A 0, there is a projection P, := f\(T), where f) is a continuous 
function which takes the value | around \ and 0 around all other eigenvalues. 


4. The projections P,, to the eigenspaces X, of T commute and are orthogonal, so 
Ey, := P| +---+P, isa projection onto X), +---+X), (Exercise 8.17(2)). The 
spectral decomposition can be rewritten as Tx = Ss. AnOEnx, where OEy := 
Ey — En—1 = Py. This can be seen as a breakup of T = sh Jocr) 2(z— T)-! dz 
into integrals on the disconnected components of the spectrum. 


5. According to SVD, any matrix T can be approximated by 5°, o,An where 
An = Unv; and the sum is taken over the largest singular values. Typically, 
data from variables x;,..., xX, 1s organized in the form of a matrix T with the 
rows representing the different variables and the columns the normalized mea- 
sured instances; the resulting u, associated with the largest singular values are 
linear combinations of the variables x, that account for the most variability in the 
data. 


6. If T © B(X) is compact normal, then the singular values of T are the absolute 
values of its non-zero eigenvalues. 
Proof Clearly, if Tx = Ax then T*Tx = 7x. Conversely, if T*Tx = px 
(uw # 0) then 
O= (T*T — p)x = (nl? — wens xen 


n 


so jt = |An|? for some n. 
Exercises 15.25 


, : 23 110 
1. Find the singular values and vectors of ( 0 and ( 01 i). 
2. If S and T are commuting self-adjoint compact operators, then they are simulta- 
neously diagonalizable (Hint: consider S$ + iT). 


3. (a) Let T be ann x n self-adjoint matrix, with eigenvalues Ay < +--+ < Ay 
(including repeated eigenvalues), and corresponding orthonormal eigenvec- 
tors v1, ..., Un. If M is aclosed linear subspace, with orthogonal projection 
P, then the restriction of PT P to M is also self-adjoint with eigenvalues, say, 
[41 < +++ < Pm, and corresponding orthonormal eigenvectors uj, ..., Um. 
Taking a unit vector x € [u,,...,4;] O[v;,..., vn]] 4 0, we get 


fy < (x, Tx) <p; and A; < (x, Tx) < Ap. 


It follows that \j <ju;. Similarly, take x € [[uj,.-.,Um]O[01, ---, Vi4n—m]] 
# 0 to deduce ju; < Aj+n—m-. Combining the results we get 
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Ni < pi < An—m4i- 


(b) Interlacing theorem: If the kth row and column of a self-adjoint matrix are 
removed, the new eigenvalues /1; are interlaced with the old ones ;: 


At < pr S A2 S++! SAn-1 < Mn-1 SK An 


4. Picard’s criterion: Suppose T € B(X, Y) is acompact operator on Hilbert spaces 
X,Y, having singular values o,, and singular vectors vy, un. In solving Tx = y, 
we find that (un, y) = On (Un, x) for alln. A necessary condition is (un, y)/On € 
é? as well as y € (ker 7*)+. Thus the coefficients of y must ‘diminish faster’ 
than op. 


5. Truncated Singular Value Decomposition (TSVD) The series solution 
u ’ 
“= > (Un, Y) us 
n on 


of T*Tx = T*y need not converge in general. Even if it does, any small errors in 
(Un, y) are magnified as 0, — 0. In practice, the series is truncated at some stage 
to avoid this. The cutoff point is best taken when the error in y becomes appreciable 
compared to o,,. Use the Tikhonov regularization method (Section 10.5) to derive 
another way of doing this (for the right choice of a), 


But any other weighting >, wn MnJ) yy, where w,, vanishes sufficiently rapidly 


On 
as On — O, is just as valid. 


6. It is instructive to compare with the case of solving the equation (T — A)x = y 
where T is compact in B(H) and0 4 \ € o(T) (the case A ¢ o(T) is trivial). It 
has a solution = y € ker(T — d)+. That solution of minimum norm is then 


(en, y) 
x= > — xen — yo/A, 


where the sum is taken over A, # A, 0, and yo is the projection of y to ker T. 
There is no issue of convergence of the series as |A, — A| > c > 0. 


7. * If T is a compact normal operator, then the iteration vp41) := Tup/||Tvpll 
(starting from a generic vector vo) converges to an eigenvector of the largest 
eigenvalue, if this is unique and strictly positive. What happens otherwise? 
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Ideals of Compact Operators 


Another way of looking at the spectral theorem (or even the singular value decom- 
position), is the following: 


Proposition 15.26 


Any compact operator on a separable Hilbert space can be approximated 
by a square matrix. 


A compact normal operator on a separable complex Hilbert space can be 
approximated by a diagonalizable matrix. 


Proof An operator T € B(#) takes the matrix form, in terms of a countable ortho- 
normal basis e; of H, 


P,T P,, | P,T(I — Py) 


CU = Pi)T Ph UT ~ P,)TU = Pn) 


where P, is the self-adjoint/orthogonal projection onto [[e1,..., e,]] (Example 
15.14(1)). Note that for any vector x € H, Phx — x asn — o (Theorem 10.31). 
The claim is that when T is compact, the finite square matrices P,,T P,, converge to 
T. This is the same as claiming that the other three sub-matrices vanish as n + oo. 

(I — P,)T — 0: Suppose, for contradiction, that there are unit vectors x, such 
that || — P,)T xn|| > c > 0. Since T is compact, there is a convergent subsequence 
TXn — x, hence 


CU — Py)T x, = IU — Py)x + Ud — Py)(Txn — x) > 0 


leads to an impossibility. 

(I — P,)T Py — O and J — P,)TU — P,) > 0 now follow from ||P, || = 1 = 
| — P,,||. Finally, T(7 — P,,) — 0 is also true and follows from (J — P,,)T* — 0, 
since T* is also a compact operator (Proposition 11.31). 

For a compact normal operator, the orthonormal basis e; can be chosen to consist 
of the eigenvectors of T by the Spectral Theorem, in which case P,, T P,, is a diagonal 
matrix 


n 
P.T Pas > Avepe?. 


i=l 
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Proposition 15.27 


The compact operators of finite rank acting on a Hilbert space H form 
asimple «- ideal K- (1), which is contained in every non-zero ideal of B( #7). 


The closure of K-() in B(#) is the x-ideal of compact operators K(/). 


Proof The facts that the sum of compact operators, the product of a compact operator 
with any other operator, and the adjoint of a compact operator, are compact have 
already been proved earlier (Propositions 11.9 and 11.31), so K(H) is a x-ideal 
in B(A). 

Similarly, it is not difficult to show that the sum of two finite-rank operators, and 
the product (left or right) of a finite-rank operator with any other operator, are again 
finite-rank. The details are left to the reader. 

Let Z be an ideal in B( 1) which contains a non-zero operator S. There exist non- 
zero vectors a,b such that Sa = b. For any vectors x, y # 0, define the operator 
Eyy = xy*/|Ly|l?, so that Exyy = x, but Eyyu = 0 whenever u 1 y. The operator 
Ey»SEay has precisely the same effect 


ExpSEqyy = ExpSa = Expo = x, ExpSEqyu =0 uly), 


80 Exy = ExpSEqy € Z. Now let T be any operator on H. If e),..., én are linearly 
independent in (ker T)+, then Te,,..., Te, remain linearly independent in im T, 
for 


TS); wei) = >), aTe; =0 > > cei € ker TO (ker T)+ =0 
i 


=> a; = 0, 2 eee |e 


Thus, if 7 is of finite-rank then (ker 7) is finite-dimensional and has a finite ortho- 
normal basis e;, ..., ev, Say, extended to an orthonormal basis for all of H. Inciden- 
tally, this shows that 7* is also of finite rank, since im T* = (ker Yi age Given any 
vector x = >), Qnén € H, 


N N N 
Tx= T(> Qnen) = > AnT en = > On ET e,,e,€n — > ETe,,enX 
n n=1 n=1 n=1 


so T is a linear combination of operators Ee, 2, and belongs to Z. We have shown 
that Cr(AH) CT and K (A) is closed under adjoints. 

In particular -(H) contains no non-zero ideals; we say it is simple. That the 
closure of K -(H) is K(#) is essentially the content of the previous proposition: More 
precisely, recall that the image of a compact operator is separable, so M := im T 
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has a countable basis (e;). Let P, be the orthogonal projection onto [[e1,..., en]. 
Then, as in the proof of the previous proposition, the finite-rank operators P,T 
converge to T. oO 


Examples 15.28 


1. The ideal of compact operators, being the closure K(H) = Kr (H), is contained 
in every closed ideal of B( #7). 


2. The algebra of matrices B(C’) = Kr(CX) = K(CX) is simple. 


3. } The above argument can be extended to show, more generally, that compact 
operators on a Banach space with a Schauder basis can be approximated by finite- 
rank operators. Spaces for which this is true are said to have the “approximation 
property”; even separable spaces may fail to have this property [41]. 


Hilbert-Schmidt Operators 


Definition 15.29 


The trace of an operator T on a Hilbert space with an orthonormal basis e,, 
is, when finite, 


ae — eles) 


n 


A Hilbert-Schmidt operator is one such that tr(T*T) = >”, ||Ten ||? is finite. 


As defined, the trace of an operator can depend on the choice of orthonormal 
basis. But for a Hilbert-Schmidt operator, tr(7* 7) is well-defined as the proof of the 
next proposition shows: 


Proposition 15.30 
If the right-hand traces exist, 
r(S+7)=t(S)+tr(7), trOAT)=At(T), tr(*) =tr(7). 


If S$, 7 are Hilbert-Schmidt, then tr(S7) = tr(7'S). 


Proof The identities tr(S + T) = tr(S) + tr(T) and tr(AT) = A tr(T) follow easily 
from the linearity of the inner product and summation, while 


t= > ea) = > Pen => Cale) Hem: 


n n n 
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Let e, and @, be orthonormal bases for the Hilbert space H; then Te, = 
on (Em, Teén)€m and STe, = Dae (Em, T én) S€m, SO 


HOT) = \ lee ST an) = > Cy Ven) avenue Sen), C51) 


n nm m 


exchanging the order of summation. This would be justified if the convergence is 
absolute, which is the case when S* and T are Hilbert-Schmidt, 


Di Mem, Ten) (en, SEm)| <_[S"MEms Tend? [S° Men, SEm) I? 


nim nim nm 


= />oITenll? >. S*enll?, (15.2) 
n n 


applying the Cauchy-Schwarz inequality and Parseval’s identity. So, putting S = T* 
and é, = é, in (15.1) shows that tr(T*7T) = tr(T7T*), when T is Hilbert-Schmidt, 
i.e., T* is also Hilbert-Schmidt. This, in turn, implies that when S and T are Hilbert- 
Schmidt, (15.2) and (15.1) are satisfied, so tr(7'S) = tr(ST) (in particular tr(7*T)) 
is independent of the orthonormal basis. Oo 


Theorem 15.31 


The Hilbert-Schmidt operators of B(H) form a Hilbert space HS, with 
inner product 


(Sie (Sy Serie) 


n 


which is a «-ideal of compact operators, and 


ITI < 


ITllas, = WST Ilys < SINT Ils. 


Proof Let e, bean orthonormal basis for H. First note that ||T ||z,.5 := J/(T, T) ys = 
Jtr(T*T) is finite for Hilbert-Schmidt operators. 


(i) We have remarked in the preceding proposition that if T €¢ HS then T* € HS, 
and 


IT* lls = Vte(TT*) = Vtr(T*T) = IIT lls. 


The product (S$, 7) := tr(S*7) is finite and independent of the choice of ortho- 
normal basis when S, T € HS, by (15.1) and (15.2). Moreover, both of the following 
traces are finite, 
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tr(S + T)*(S + T) = tr(S*S) + tr(S*T) + tr(T*S) + tr(T*T) 
tr(AT)*(AT) = |Al? tr(T*T), 


so that 71S is a vector space. 
Linearity and ‘symmetry’ of the product follow from 


(S, Ty + Tz) = tr(S*T, + S*T2) = tr(S*T)) + tr(S*T2) = (S, T1) + (S, 72), 
(SAT) =t(S ATV SA OST) = MST, 
(T, S) = te(T*S) = tr(S*T)* = t(S*T) = (ST). 


That ||7'|| < ||T lzzs (and hence ||T||z75 = 0 = T = 0) follows from 


xl] = Do (en. x) Tenll < D2 [en x) IN Tenll 
n n 


< Lien Zen? = Ix IIIT las: 
n n 


(-, -) is therefore a legitimate inner product on HS. 
Finally, 71S is an ideal of B(H), since for any S € B(H) and T € HS, 


IST ligs = D5 WSTenl? < SoS? Teall? = WSIPIT Niys> 
n n 
and |IT'Sllys = IT'S)" Ins < WS*UNT* lacs = SINT s- 


(ii) Hilbert-Schmidt operators are compact: Given T € HS, define the finite-rank 


ifn<g 
operator Ty by Tyen := Ten ifn<N 


0 ifn>N- 
CO CO 
IZ —Tyll? < IT —Tvllys = DIT —Twenll? = >° Tell? > 0 as N > oo. 
n=1 n=N+1 


T is thus the limit of finite-rank operators, making it compact (Proposition 11.9). 


(iii) The space HS is complete in the HS-norm (but not necessarily in the operator 
norm): let (T;,) be an 7#S-Cauchy sequence 


Tn — Tmll345 = >, (Tr — Tn eill? > 0 asn,m— 00, 


L 


then it is a Cauchy sequence in the operator norm, and thus 7, — T in B(A). 
But writing the Cauchy condition in a slightly different way, the sequences x, := 
(\|(Zn — Te; ||) form a Cauchy sequence in 07, 
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L 


2 
lIXn — Xmllz2 = >. [IE — Teall — |Em — Teall |” < Do Tn — Tm)ei ll? > 0, 
i 


asn,m — CO;SO Xp converges to some sequence (aj) € e?. Combining T,,e; > Te; 
with ||7),e; — Te; || > a; for alli, each a; must be 0, and 


In — Tus = > Wn — Teil? = [Ixnllzz > 0 asn > 00, 


U 
so T, — T in HS, and T € HS since ||T ||q5 < IT — Tillas + Tnllns < 0.0 


Having established a theory of Hilbert-Schmidt operators, we now exhibit an 
important specific example: 


Theorem 15.32 


If k € L*(R?), then the operator on L7(R) 


TOS / ke, WFO) dy, 


is Hilbert-Schmidt with ||T ||zz5 = |Ikll,2- 


Proof Let e,(x) be any orthonormal basis for L?(R). Then any function of x in 
L?(R) can be written as a sum of these basis functions. Analogously any function 
of two variables x, y in L?(R2) can be written as a sum (convergent in L?(R?)) 


k(x, y)= DY enmenem(y), 


m,n 


by first fixing y and expanding in terms of e,(x) and then treating the result as a 
function of y. Write @; ® em for the basis functions (x, y) > en (x)em(y). They are 
orthonormal, since 


eos, 6e= 7 : eee en 0) day 


= (En, en) (Em's €m) = On'nOm'm- 
By Parseval’s identity ||k||72 = ff |k(x, y)? dx dy = >.» |n.ml?. Clearly, 


(én, Tem) = // €n(x)kK(X, Y)€m(y) dx dy = (€m ® en, K) 12.0R2) = Am,n>s 


so 
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2 2 2 2 2 
IT lds = > ITenll? = >> Men. Tem)? = > lomal? = Wlkll22, = 
n 


nm nm 


Examples 15.33 


tT= Ti, ITins= di, \Tl?. (STs = Di Si Tj. 


2. More generally, for any Hilbert space, and using Parseval’s identity, 


IT gus = >. Teil? = > \e;, Tei)? 
i i,j 


3. b> For a Hilbert-Schmidt normal operator, ||T |ly45 = ,/>_, |An|?, where Ay, are 
the eigenvalues of 7. In this case it is evident that ||T'||z745 > max, |An| = ||T'|l. 


4. Find the eigenvalues and eigenfunctions of the integral operator on L7[0, 1] with 
yd-x)0<y<x<l 

xd-y)0<xgy<l’ 

Solution. The operator is Hilbert-Schmidt since |k(x, y)| < 1. The eigenvalue 
equation is 


kernel k(x, y) = 


x 1 


Joa = foydy+ f xa — yf) dy = Af). 


0 x 


The eigenfunctions can be assumed to be differentiable, essentially because they 
are integrals. Differentiating gives 


x 1 
x(1—x) f(x) — f yf(y) dy — x — x) f(x) + fd — y) f(y) dy = Af’(x), 
0 x 


and again, —xf(x)- Ud —x) fx) =Af’ (x), 
fl") +5f@)=0, fO)=0= f(1). 


The solutions of this differential equation are the eigenfunctions f,(x) = 
sin(ntx) with eigenvalues A, = 1/(n*77). 


5. A traceless operator in B(C’) has a matrix with a zero diagonal, with respect to 
some orthonormal basis. 


Proof Let A be an N x N matrix with tr A = 0. The proof is by induction on NV. 
Since the numerical range of A is convex, 


1 i 
O= > rA=— > we WA) 


n=1 
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where ,, are the eigenvalues of A. So there is a unit vector u such that (u, Au) = 0. 
The matrix restricted to ut, A := A|,,1, is still traceless 


O0=tA=trA+ (u, Au) =trA. 


Therefore, by induction, there is an orthonormal basis e1,...,env—1 of u+ in 
which A has zero diagonal, i.e., (e;, Ae;) = 0. This basis, together with wu is the 
required basis for the whole N-dimensional space. 


6. < There is a correspondence between various ideals of compact operators and 
the sequence spaces of their singular values (A,): 


Finite-rank operators Kr(A) (An) € coo 
Trace-class operators Tr(H) (An) € €! 
Hilbert-Schmidt operators HS(H) (An) € €2 
Compact operators K(A) (An) € Co 
Bounded operators B(A) (An) € &° 


where the set of trace-classoperators has been added to complete the pic- 
ture (Exercise 15.49(11)). More generally, the Schatten-von Neumann class of 
operators C, corresponds to (Ay) € ¢?. The analogy goes deeper than this: 
K(H)* = Tr(A) and Tr(H)* = B(A) (via the functionals T +> tr(ST)). 


Exercises 15.34 
1. (a) (S*, T*)us = (T, S)yH18, 
(b) (RT*, S)yg = (R, ST) yg = (S*R,T)H8. 
2. The closest number to ann x n matrix T (in the HS-norm) is tr(T)/n. (Hint: 
A-T LT) 


3. The map x +> My, where My y := xy, embeds ¢7 into HS(€7) (isometrically). 
More generally, if x, € H satisfy >”, lxn|I7 < 00, then T := >, Xnen is 
Hilbert-Schmidt with ||T||3,5 = >, llxnll?. 


4. The Volterra operator on L7[0, 1], V fa) := te f is Hilbert-Schmidt (without 
any eigenvalues). 


5. Ifk(x, y) = k(x — y) for a real function k(x) € P61) (Example 8.6(5)), then 


~ 


Tf :=k« f is Hilbert-Schmidt, with eigenvalues k(n). 

6. Find the eigenfunctions and eigenvalues of the HS-compact self-adjoint operators 
Tf := fy k(x, y) f(y) dy (on L?[0, 1]), where 
(a) k@,y)i=xt+y, 


Ji l-x<y<l 
Oe eee ee 
T 


peer 1 L 
(c) k(x, y) := min(x, y); deduce that >", Gath? = 56 and Ln w= 


i 
= 


ci 
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7. In the original Fredholm theory, it was proved under certain hypotheses that the 


equation 
b 


Fx) + / K(x, y) FO) dy = g(a) 


a 


either has a unique solution, or else the same equation with g = 0 admits a 
finite number of linearly independent solutions. Show this for f,g € L?(R), 
k € L?(R?), using Proposition 14.17. 


15.4 Representation Theorems 


We return to a general unital C*-algebra Y and recover some of the previous propo- 
sitions in this setting. The aim is to widen the functional calculus for normal elements 
and to prove that V is embedded in B(H) for some Hilbert space H. 


Proposition 15.35 


For any @€S(%v),TeE*, 
6T*=6T, T*=T"*. 


Proof If A is self-adjoint and t € R, then 


|A + itll? = |[(A + it)*(A + i) ee, 
= A> +7] < Al? +27 
it 
(As a matter of fact, equality holds as the accom- 
ing di hows. eee 
panying diagram shows.) 0 ay 


Writing ¢A =: a + ib, we find 


|b+t|< Geran) = = |¢(A +it)| < ||A + itll < VIIAI? + 2? 
Qt+b)b< |All? forte R 


so b = Oand A € R. Note that A(A) C a(A) C S(A) C R. More generally, for 
any T=A+iB € &, with A, B self-adjoint, 


oT* = o(A —iB) = 6A —idB = 6A +ioB = OT. 
In particular, every ~ € A is automatically a *-morphism, and 


T*() = pT* = OT = TW)". Oo 
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Theorem 15.36 The Functional Calculus for Normal Elements 


When 7 is normal, 7 := C[7, T*] is a commutative closed x-subalgebra 
of *, isometrically «-isomorphic to C(o(7)). 


The identity F(T) = if © T defines a normal element f(T) whenever 


f € C(o(T)); then o( f(T)) = f(o(T)). 


Proof T isacommutative closed *-subalgebraof X: Since T isnormal, T"(T*)" = 
(T*)"T" (by induction), so it should be obvious that (i) any polynomial in T and 
T* can be written uniquely in the form >", ,, dnymT”"(T*)”, (ii) the product (and 
addition) of two polynomials in T and T* is another polynomial, (iii) this product 
commutes, and (iv) the involute of a polynomial p(T, T*) remains in 7, 


PT, T*)* = (Yann) = Daw T"(T*Y" € CIT, T*). 


nym n,m 


C[T, T*] is thus a commutative *«-subalgebra. The closure of such a subalgebra in V 
remains a commutative *-subalgebra (Prove!). Note that J is obviously separable. 


The spectrum of S € Y, with respect to a closed x*-subalgebra Y C X, is 
a(S): Clearly, if S (or S — ) is invertible in Y, it remains so in V. Conversely, 
if S is invertible in 1, then so are S*, S*S and SS*. But S*S is self-adjoint, with 
a real spectrum (in Y and ¥), hence S*S + i/n is invertible in Y. As Y is closed 
and (S*S +i/n)~! — (S*S)~! in XY, as n — oo, we can deduce (S*S)~! € Y. 
Similarly (SS BH) ey. implying S is invertible in Y (Exercise 15.3(4)). 


T 3 Ar — a(T) is a homeomorphism: (Ar is the character space of T.) T is 
1-1 since suppose T (1) = T (2) for some 31, W2 € Ar, 1e., WT = WoT. Then 


WiT* = WT = YT = YoT* 
odie, T*) =i anmT(T™)*) 


nm 


= Dian, 27)" aT" = 2p (T, T*) 


nim 


for any polynomial p; finally, by continuity of ~ and ¢2, YS = 2S forall S € T, 
proving 7%, = w2. That T is onto was proved in Theorem 14.38. It is continuous 
because 


Wn-v> T (wn) =UnT > ~T = T (W). 


So T isa homeomorphism since Avy is a compact metric space (Proposition 6.17 
and Example 14.35(8)). Hence any z € o(T) corresponds uniquely to some 7) € Ar 
viaz=T(w)= VT. 
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The Gelfand transformG : T — C(Ar) = C(o(T)) is an isometric *- 
isomorphism: Recall that G is a Banach algebra morphism (Theorem 14.37). In 
a commutative C*-algebra such as T, every element S € T is normal, so |S lc = 

p(S) = ||S|| (Theorem 14.38); furthermore S* = $*, and the Gelfand transform is 
an isometric *-embedding. 

In fact it is onto: for any polynomial p, p(T, T*) is mapped by it to p(z, Zz) when 
regarded as a function on o(T). By the Stone-Weierstraf theorem, these polynomials 
are dense in C(o(T)). Hence, since G is isometric, it extends to T — C(Ar). 

The continuous function calculus: The correspondence between elements in J and 
functions in C(A7) allows us to extend the analytic function calculus established 
earlier. For any continuous function f € C(a(T)), the composition foT : Ar > C 
corresponds to some (normal) element in J which is denoted by f(T). By this 
definition, F(T) = fe T. The following identities are true because they mirror the 
same properties in C(A7), 


(f +91) = fO)+4+ 97), ONT) =AFH), FHT) = F(T)g7), FO) = FY". 


Finally || f(T)|| = || f llc is due to G being an isometry and go f(T) = g(f(T)) 
follows after 


o(f(T)) = im f(T) = im f of = f imT = f(o(T)). o 


Examples 15.37 


1. To take a simple example, consider a 2 x 2 diagonalizable matrix T with distinct 
eigenvalues \; and corresponding orthonormal eigenvectors v;,i = 1, 2. Its char- 
acter space Avr consists of the two morphisms v5 := (v;, Svu;) for S € T. The 
Gelfand transform takes T to (A;, Az); any other matrix f(T) is simultaneously 
‘diagonalized’ to (f(A1), fQ2))- 


2. » For any elements S,, 52 € T, 
a(S + S2) € o(S1) + o(S2), 7 (S1S2) € a(S1)a(S2). 
Proof As T is commutative, Theorem 14.38 shows that o(S) = A7rS for any 


S € JT. Hence the statements follow from Exercise 14.40(10b)). 


3. If S, T are commuting normal elements, and f € C(a(S)), g € C(a(T)), then 
f(S)g(T) = g(T) f (S). 


Proof Take polynomials p and q, in z and z*, then p(S, S*)q(T, T*) = q(T, T*) 
p(S, S*) since they are sums of terms of the form 


as” Se pipes = aTiTi* gs" gem 


by an application of Fuglede’s theorem. Taking the limit of polynomials converg- 
ing to f, g (by the Stone-Weierstrass theorem) gives the required result. 
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4. The self-adjoint elements of J correspond to the real-valued functions f € 
C(A,r) and form a real Banach algebra, while the unitary elements correspond 
to functions with unit absolute value, | f| = 1. 


5. Every commutative unital C*-algebra, is isometrically -isomorphic to C(A), 
via the Gelfand map. The algebras C(K), with K a compact metric space, are 
therefore typical separable commutative C*-algebras. 


Proposition 15.38 


For T normal, 
T is unitary = o(T) C C= 
T is self-adjoint = o(T) CR. 


Proof (i) The spectrum of a unitary element U must lie in the unit closed ball since 
|U || = 1. Now, U — A = UC — AU*) and ||AU*|| = JAl||U*]] = [A]; so JA] < 1 
implies 1 — AU*, and thus U — 4, are invertible (Theorem 13.20). 

(Equivalently, if \ € o(U) then \~! € o(U~!) = o(U*) = o(U)* and so both 
|A| and 1/|A| are less than 1.) 


(11) We have already seen that S(T) C R when T is self-adjoint, and S(T) includes 
o(T). (Alternatively, e’” is unitary (Example 15.5(11)) and the spectral mapping 
theorem gives e!?7) = a(e'7) C e!®, But fei @*!)| = e? is 1 only when b = 0, 
from which follows that o(T) C R.) 


(111) For the converses, let T be normal with o(T) C R. Writing it as A + iB with 
A, B commuting self-adjoint, we see thatiB = T — A, so 


o(iB) C o(T) + 0(—A) CR, Example 2 above 


yet 0(iB) = io(B) C iR. Thus o(B) = {0}, B = 0, and T = A is self-adjoint. 
(Alternatively, we can work with S: if T is normal and a(T) is real, then S(T) C 
R; for any 6 € S, 6(T — T*) = ¢T — oT =0, hence T — T* = 0.) 


(iv) If T is normal with o(T) C e’®, then 


o(T*T) C o(T*)o(T) = o(T)*o(T) € e®, 


As T*T is self-adjoint and has a real spectrum, that leaves only +1 as possible 
spectral values. But 1 + T*T is invertible, otherwise there is aw € Ar such that 


l= (TT) = yT*UT = |bT/, 


a contradiction. So o(T*T) = {1}, 1 = T*T = TT* and T is unitary. oO 
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Exercises 15.39 


1. 


10. 


Find an example of an operator T having a real spectrum, without T being 
self-adjoint. 


. If J is a *-morphism and T is normal, then J(f(T)) = f(J(T)) (first prove, 


for any polynomial p, J(p(T, T*)) = p(J(T), J(T)*)). 


. » InaC*-algebra, S(T) = 0 > T = 0(write T = A+iB). We say that S(V) 


separates points of X: if T # S, then there isa ¢@ € S such that dT 4 @S. 


. Suppose a C*-algebra has two involutions, * and * (with the same norm). 


Show that T* = T* for all T — the involution is unique. (Hint: ¢(T*) = oT 
= 9(T*).) 


. Every normal cyclic element is unitary. In particular, the normal elements of a 


finite subgroup of G(1) are unitary. 


. The Fourier transformF : L?(R) > L?(R) is unitary; in fact it is cyclic 


F* = 1, so that it has four eigenvalues +1, +i. Verify that the following are 
eigenfunctions: ent xew™, (4rx2 — Den, (4rx3 — 3x)en™, 


. Anormal T such that ||7 || = 1 = ||7~!|] is unitary. 


. Normal idempotents are self-adjoint. A normal element T with o(T) C {0, 1} 


is an idempotent, e.g. when T is normal and T”+! = T” for some integer n. 


. Suppose M is a closed subspace of a Hilbert space which is invariant under a 


group of unitary operators. Show that M+ is also invariant. 


If 7, are self-adjoint operators and 7,, — T then T is self-adjoint. 


Positive Self-Adjoint Elements 


For T, S self-adjoint, let T < S be defined to mean o(S—T) C€ [0, oo]. Equivalently, 
since S(V)(S — T) is the closed convex hull of o($ — T) (Proposition 15.8), 


T<S &VbES(X), oT < dS. 


Proposition 15.40 


The self-adjoint elements form an ordered real Banach space, such that 


T<SANDR<Q>T+RK<S+Q, 
T<SS3> RTRKRSR VREX. 
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Proof First note that, by the definition, T << S © O< S-T © T-S <0 
( —S < —T),so we might as well consider A := S—T > Oand B:= Q—R>O 
in proving some of the assertions. 


(i) It is trivially true that self-adjoint elements form a real vector subspace 
(S4+T)*=S*4+T*=S4T,  (AT)*=AT*=XT, VAER. 


If T, — T with T,* = T,,, then in the limit, 7* = T, so the subspace is closed. 


(ii) That T < T is immediate from o(0) = {0}. For anti-symmetry, note that 


0<AK<0 => o(A)={0} = |All = p(A =0 > A=O, 
so S<T<S3 T=S. 


(iii) To facilitate the rest of the proof, we demonstrate 
ax<T<b# a(T) C[a,b] (15.3) 


in two parts, 


T © o(T)—-—a=o(T —a) C[0,~] © a(T) C [a, w] 


ag 
T <b & o(T) —b =a(T —b) € J-~, 0] & a(T) € J-ow, B]. 


In particular, note that T < p(T) = ||7|| and that if0 < T < bthen p(T) <b 
(iv) A,B >0 => A+B = 0: In general, 


C+D<|lC+ DI < Cll + Dll = eC) + ep). 


Let a := p(A), then 0 < A < acan be rewritten asO < a — A < a and hence 
pla — A) < a. Similarly ae B)< b := p(B), so (a— A) + (b— B) < a+b, or 
equivalently, A+ B > 0. 


(v) A special case of this shows transitivity of the order relation, 


T<SKR>0K(R-S)+(S-T)=R-TSOTER 


(vi) We are not at this stage able to prove the full product-inequality rule as claimed 
in the proposition. The proof is deferred to the next proposition. Here we show 
only the simple case when R is scalar, i.e., if A > 0 and A = S—T > 0, then 
o(AA) = Ao(A) CRT. 

The continuous functional calculus allows us to extend the domain of all contin- 
uous real functions f : R — R to the set of self-adjoint elements. Two functions in 
particular stand out: 
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(i) the positive square root /A when A > 0, satisfying (A)? = A = V A2, and 


> 
(ii) A+ for all A self-adjoint, from the function x, := ; s ' ; similarly 


A_— from x_ := (—x) +. Their sum then gives |A|, which corresponds to the 
function x +> |x|. 


Examples 15.41 
1. (a) If-T <S<T then |/S|| < ||T]]. 
(b) If0 <a <T <b then T is invertible and b~! < T7! <a}. 
(c) If ST > OthenTS > 0. 


(d) If S, 7 > O and ST is self-adjoint, then ST > 0. In particular, T > 0 => 
T’ >0. 


(e) If S, < T, and S, — S,T, > T,thenS < T. 
Proof (a) —||T || < S < ||T |], soo(S) S [-]|T]}, || 7 |] and || S|] = pS) < ||]. 


(b) o(T) € [a, b] does not include 0; o(T~!) = o (T)~! € [b-!",.a7!]. 

(c) o(TS) is the same as o(ST) except possibly for the inclusion or exclusion 
of 0. In any case o(ST) CRT S o(TS) CR. 

(d) Recall that ST is self-adjoint exactly when ST = TS. So, by Exer- 
cise 14.40(17)), o(ST) C o(S)o(T) C RY. 

(e) Let A, := T, — S, > Oand A, — A:= T —S.Then0 < ¢A, > @A for 
any ¢ € S,so S(A) C [0, co]. 


2. The set of positive elements is a closed convex ‘cone’ (meaning T > 0 AND 


A >0 => AT = O), with non-empty interior in the real Banach space of 
self-adjoints. 


Proof The only non-trivial statement is that the cone contains an open set of 
self-adjoints, namely the unit ball around 1: If A is self-adjoint and ||A|| < 1 
then—-1< A<l,sol+AZ0. 


3. Positive continuous functions f: R — R give positive elements f(A) > 0 
for A self-adjoint. For example, A,, A_, |A|, and A? are all positive. More 
generally, for any normal operator T and f € C(C, R*), f(T) > 0. 


Proof By the functional calculus, o( f(T)) = fo(T) © [0, ow]. 
4. Every self-adjoint element decomposes into two positive elements 


(a) AS A,—A_,|AJ=A,+A4-, 

(b) AyA~ = 0, Ax|A| = AZ, AZA = +A3, and Ay, A_, A and |A| all 
commute with each other, 

(c) -A~ < AKA S<IA|< |All. 
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Proof The identities x = x, — x_, |x| = x4 + x_, x4x_ = 0, x4|x| = a 
X4X = +x2 imply (a) and (b). Moreover, A+ A_ = Ay > 0, |A| — Ay = 
A, —A= A_ > 0. Finally, o({A|) = {|A] : A € o(A)} is bounded above by 
p(A) = Ill. 


5. By the spectral mapping theorem, the spectral values of VA are the positive 
square roots of those of A > 0. Overall there may be an infinite number of 


2 
square roots of A, eg. for any z € C, ( . =) = (ai) 


l-z -z 01 
60<S<T 3s VS<K VT. 


1 2 


Proof If T is invertible, then T~? ST~? < 1 (Proposition 15.40), so ||S27~? || 
= ||T~2ST~2|| < 1, from which follows T~4S2T7# < land S$? <T?. 


Proposition 15.42 


For any T € X and¢ge S(X), 

Gh) Jr SO. 

di) T>0 © T=R*R, forsome Re X, 
(iii) (S, 7) := (S*T) gives a semi-inner product, 
(iv) [(S*T)?? < O(S*S)O(T*T), |OT |? < (T*T), 
(v) |d(S*T'S)| < G(S*S)|IT |. 


Proof (i) T*T is certainly self-adjoint, and can be decomposed as T*T = A — B 
where A, B > 0, AB = BA = 0 (Example 4b above). Now 


(TRV RB) = BT Th S BAH Be SP? <0 


and hence (7 B)(T B)* < 0 (Examples 15.41(1c)). Writing TB = C + iD, with 
C, D self-adjoint, we find 


0< 2(C? + D*®) = (TBY(TB) + (TBYTB)* <0 
0<C’*?=-D’<0 
C=0=D 


so TB = 0. But then, 0 = (TB)*(TB) = —B? forces B = 0 and T*T = 
A>0. 

This allows us to conclude the proof of Proposition 15.40(vi). If T < S let 
A := S—T > 0,s0 for any R € X, R*AR = (VAR)*(VAR) > O, ie. 
R*TR < R*SR. 


(ii) Conversely, if T is positive, let R := JT >0,so R*R= R2=T. 
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(iii) The product satisfies the following inner-product axioms, 


(S, AT, + HT) = PAST + wST2) = A(S, T1) + W(S, Tr), 
(T,S) = o(T*S) = d(S*T)* = (S, T), 
(T,T)=o(T*T) >0 since T*T > 0. 


However, it need not be definite, i.e., 6(7*T) = 0 may be possible without T = 0. 


(iv) This is the Cauchy-Schwarz inequality, which is valid even for semi-definite 
inner products (Example 10.10(17)). In particular, taking S$ = | gives the second 
inequality. 


(v) As ¢ preserves inequalities, 


Pralrr|(=(t => Sr rs <lris"s 

=> S*T*TS) < o(S*S)IT I. 
W(S*S)4(S*T*TS) by (iv), 
H(S*S)?||T ||? o 


|o(S*(T'S))|* 


< 
< 


Proposition 15.43 


If J : X = Vis an algebraic «-morphism between C*-algebras, then it is 
continuous with || /|| = 1, and preserves <. 


If J is also 1-1, then it is isometric. 


By an algebraic *-morphism is meant a map which preserves +, -, 1, and x. 


Proof If A > 0, then A = R*R and J(A) = J(R)*J(R) = 0. Thus J preserves the 
order of self-adjoint elements, 


S<T > JT-S\20 > J(S)< J(T). 
Now for any T (noting that J(1) = 1), 


OS T*T < TI. 
o 0< J(T*T) < ITIP, 
1 1 
NF =F IO = lF*T) < ITI. 


If J is 1-1, then one can form the ‘inverse’ J~! : im J > %. It is automatically 
an algebraic *-morphism (check!), for example, for any $ € im J, 


J7'(s*) = JTUT)* = T'(T*) = T* = (I T))* = I 'S)*, 


and so || J~!(S)| < ||S|]. Thus ||7'|] < ||J(7)|| < |||] as required. 
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(Alternatively, defining ||T|| := ||J(T)|ly gives a C*-norm on %. But there 
can only be one C*-norm (Exercise 15.10(6)), so J is an isometry and im J 
is closed.) oO 


Exercises 15.44 


1. 0 < 1 (as self-adjoint elements), and the order relation of R is subsumed in that 
of the self-adjoint elements. Similarly, in C[0, 1], f < g = Vx, f(x) < g(x). 


0 1-i 
S tee a) 


a(S)” in general. 


JIN 


€ 0) in B(C’). Note that T < S does not mean “o(T) < 


3. (a) A diagonal matrix is positive when all its diagonal coefficients are real and 
positive. 
(b) If the coefficients of a real symmetric matrix are positive, it does not follow 
that it is positive: Vi, j, Ajj >0 A ASO. 
(c) Butifareal symmetric matrix is dominated by its positive diagonal, meaning 
Aji > Lisi |A;;|, then A > 0 (Gershgorin’s theorem (Examples 14.10(6)). 
4. Show Re(T) > 0 © Re S(T) > 


5. The similarity between self-adjoints and real numbers is striking. But not every 
property about inequalities of real numbers carries through to self-adjoints: 


(a) Not every two self-adjoints S and T are comparable, e.g. T := ( ; a) 
satisfies neither T < O nor T > 0; 

(b) 0 < S < T does not imply S? < T? (unless S, T commute), eg. S i= 
2 i Tex 31 
liye N11 

6. In BCH), S << T & (x, Sx) < (x, Tx) for all x € A. In particular, S*S < 

T*T & |Sx|| < ||Tx|| for all x € A (e.g. T*T > 0); deduce 

(a) If T is compact then so is S, 

(b) If T is Hilbert-Schmidt, then so is S, 

(c) For self-adjoint projections in B(H), P < Q whenim P Cim@Q. 


7. ProveS <T => R*SR < R*TR forall R, in B(A). 


8. In B(A), if T > O then (x, y)) := (x, Ty) is “almost” an inner product on H, 
except that it need not be definite; it still satisfies the Cauchy-Schwarz inequality 
though, 

ix, Ty)? < (x, Tx)(y, Ty). 


Conversely, every bounded inner product ((, )) on H, in the sense that | (x, y))| < 
c||x||||y||, is of this type. Use Example 11.21(1c) to deduce that, for all x € H, 
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|Tx|| < VT IlV (x, Tx). 
In particular, (x, Tx) =0 © Tx =0. 
9. If f: R > R is increasing anda < T < bthen f(a) < f(T) < fd). 


10. To calculate f(A) for a positive self-adjoint matrix A, first diagonalize it 
A= PDP, then work out f(A) = Pf(D)P7!. For example, 


(to), =3(2:5')- VG5)= (03) 


11. There exists AY > 0 for a > 0 when A > 0, for which (A°%)!/° = A. 


12. If-1 <A <1 then A+iv1 — A? is unitary. Hence any T € 4% is the linear 
combination of at most four unitary elements. (Hint: A = (U + U*)/2.) 


13. Solve the equation TAT = B fortheunknownT > 0, given A, B > Oinvertible 
(Hint: A? TAT A? = (A2T A2)?). 


14. Consider ¢ € ¥* which preserves inequalities,0 < A > 0 < @A;j it satisfies 
Proposition 15.42 except that |T|? < ¢1¢(T*T) < (#1)?||T||7. Such positive 
functionals, as they are called, are positive multiples of states. 


15. If J: & — VY is an algebraic *-morphism, then 


X/kerJ =imJ & im J is closed. 


Polar Decomposition An important application of the use of square roots of positive 
self-adjoint elements is the following generalization of the polar decomposition of 
complex numbers to B( 1): 


Proposition 15.45 Polar Decomposition 


Every operator T € B(H) has a decomposition T = U|T|, in which 
|T|:= VT*T > Oand U : im|7T| > imT is an isometry. 


Proof T*T is positive, so its square root R := /T*T > Ocan be defined. R reduces 
to the previous definition of |7| when T is normal, so it is common to write |T| for 
R. Then |||7|x|| = ||7x|| for all x € H, as 

(IT |x, IT ly) = (x, ITPy) = &, T*Ty) = (Tx, Ty). (15.4) 


Let U:im|T| — im T be defined by U(|T |x) := Tx; itis well-defined by (15.4), 
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IT|x—y)=0 > Ta@—y)=0, 


and isometric, so can be extended isometrically to im |7| > im T (Examples 8.9(4)). 
It can be extended further to the whole of the Hilbert space H by letting Ux = 0 
whenever x belongs to the orthogonal space ker |7'|, in which case it is called a 
partial isometry. Oo 


Furthermore, when T is normal and U is extended to a partial isometry, T = |T|U 
is also true: ker |T| = ker T by (15.4) and since ker T* = ker T (Proposition 15.12), 


im |T| = (ker |7|)+ = (ker T)+ = (ker T*)+ = imT. 
In fact, 


for x € ker |T], |T|Ux =O=Tx, 
forx = |T|y e€im|T|, |T|Ux = |T|U|T|y = |T|Ty =T|T|y = Tx, 


and by extension |7|Ux = Tx for x € im|T| as well. 
On the other hand, if T is invertible, then it implies, in succession, that T*, T*T, 
and |7| are invertible; thus U is an onto isometry on H, hence unitary. 


Proposition 15.46 


Every unitary operator in B(H) is of the type e'7 with T < B(H) self- 
adjoint. 


The group of invertible operators G(H) C B(#) is connected and gener- 
ated by the exponentials. 


Proof (i) The polar decomposition of any self-adjoint operator B ¢ B(H) is B = 
V|B| where 
Vral* *€ ker B_ 
“= 1) _y x € (ker B_)+ =im B_ 


since B,x € ker B_ (B_B, = 0). Note that V7 = J. Hence 


V|B\|x = VByx + VB_x = Bix — B_x = Bx. 


Let U be any unitary operator on H. It equals U = A + iB where A, B are 
commuting self-adjoint operators such that A? + B* = J. It follows that A com- 
mutes with B_ (Example 15.37(3)) and thus preserves ker B_ and im B_ (Exercise 
8.10(19)). Accordingly, if B = V|B| is the polar decomposition of B, as above, then 
V commutes with A: for allx =a+bekerB_ ®imB_, 
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VAx = VA(a+b) = Aa — Ab = A(a — b) = AVx. 


The function arccos : [—1,1] — [0,7] is a continuous function, and —1 < 
A < 1, so we can define C := arccos A € B(A), and this commutes with V. Let 
T := VC, so that T? = V2C2 = C2. Hence, 


2 2 
Maga 4 jfiprS aos 
2! 3! 


2 3 
ee, oe eee, 


2! 3! 
=cosC +iV sinC 
=A+iV|B| (sin o arccos(A) = V1 — A? = |B]) 
=U. 


(11) Consider the polar decomposition of an invertible operator T = U|T |, where U 
is unitary and |7| is invertible. By the above, U = e'4, while |7| has a logarithm, 
[P= ge (Exercise 14.26(1)). Hence T = e'4e% lies in the connected component 
of J (Proposition 13.24), which must therefore equal G(#). oO 


Spectral Theorem for Normal Operators 


There is one further extension of the functional calculus of the C*-algebra B(H): 
when T is a normal operator, f(7) may be defined even for bounded measurable 
functions. 

Let 1g be the characteristic function defined on a bounded open subset Q C C. 
To find an operator that corresponds to 1g, we will be needing the following lemma: 

Monotone Convergence Theorem for Self-Adjoint Operators: Jf A, > 0 is a 
decreasing sequence of commuting self-adjoint operators in B(H) then A, converges 
strongly to some operator A > 0. 


Proof It is easy to show that when 0 < S < T commute, 
F< Faas Serr = eT. 


From this it follows that Ae is also a decreasing sequence, as is || A,,x || by Example 6 
above. Also |[Anx — Amx|I?_ < |l|Amx||? — ||Anx||?| > 0 as n,m — 00, since 
AnAm > AS for n > m, so (Ayx) is a Cauchy sequence in H. Now apply the 
corollary of the uniform bounded theorem (Corollary 11.35). Oo 


It follows easily from this that an increasing sequence of bounded self-adjoint 
operators A, < c converges strongly to some operator A < c. 

There exist increasing sequences of positive continuous functions f, : C > Rt 
which converge pointwise to 1g; for example, take f,(z) := min(1,nd(z, Q°)). 
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Using the continuous functional calculus defined in Theorem 15.36, f,(T) exist as 
positive self-adjoint operators on H with norm equal to || f,(7)|| = || fille = 1. 

We can therefore define lo (T)x := limy-+o0 fn (T)x forallx € H.Forany closed 
subset F of C, there are nested open sets U,, such that F = An U,. So 1r(T) can 
be defined by 17 (T)x := limp oo ly, (T)x by the monotone convergence theorem 
above. Some properties of q(T) are: 


1. 1g (ZT) is an orthogonal projection; so 1lg(T) = 0. 
Proof Write Ay := f,(T) and A := 1gQ(T). Then 


(Ay, x) = lim (Any,x) = lim (y, Anx) = (y, Ax), 
noo n—->oo 
(A2 — A?)x|] = [An + A)(An — A)xl] < 1+ IAI) (An — A)xl] > 0. 


Thus 1g(T)* = 1g(T) is self-adjoint, and hence othogonal (Example 15.14(1)). 


2. (a) If U,V are disjoint open sets, then ly (T) + ly(T) = luuv (7), 

(bo) lunv(T) = lu(T)1v (1). 
Proof If fn(z) > 1lu(z) and gn(z) > ly (z) forz € C then f(z) + 9n(z) > 
lyu(z) + lvyb(z) = lyuv(z). So by the continuous functional calculus and 
the strong convergence of f, and gy, it follows that f,(T)x + gn(T)x > 
lyuv(T)x for any x € H. 
Similarly, the second statement results from fy(z)gn(z) > lu(z)|lv(z) = 
lunv(). 


3. lg(T) = 90, locr)(T) = I (since if o(T) C U and f, > ly, then fnlocr) = 1 
for n large enough). 


The projections |¢~(7) for Borel sets E are defined by the same procedure and 
are said to be the spectral measure associated with T. We gloss over the details of 
the exact definition (see [10]). 

One can now follow the same steps of creating the space of step functions through 
to L'(C), but starting from the projections 1_(T) as ‘step functions’. The end 
result is a functional calculus in which f(T) is defined for any complex-valued 
f € L®(o(T)): if f is approximated by 5°; ajly,, then f(T) is approximately 
>; ai lu, (1). Indeed, f(T) is still meaningful even if f € i (o(T)) but need not 
be a “bounded” (i.e., continuous) operator. 


Proposition 15.47 von Neumann’s Spectral Theorem 


For any normal operator T and f € L™(c(7)), there is a spectral measure 
E) such that 


f(T) = / FOE) 


o(T) 
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Proof For any x,y € H, define pix,y(U) := (x, ly(T)y) for any open bounded 
subset U C C-. By the properties proved above, j1x,y can be extended to a measure 
with support equal to o(7). (It is not a Lebesgue measure on C as it is not translation 
invariant, but Borel sets are j1,,y-measurable.) It has the additional properties: 


— 2 
Lx, yj ty. = x,y, + Lx,yo, Px,Ay = ALlx,y> Lyx = x,y O< Lx,x < lx || : 


It follows that for any f € L®(a(T)), (x, y)) := Jocr ) f dyx, y is a semi- 
inner-product which is bounded in the sense |((x, y))| < || i, 1, i ||x||||yl|. Thus, by 
Exercise 15.44(8), (x, y)) = (x, Sy) for some continuous operator S which we 
henceforth call f(T), 


b #Oy= if ates, 


o(T) 


J (T) agrees with the earlier definitionfor f € C(a(T)): Any such f is uniformly 
continuous, so for 6 small enough f Bs(z) C Be(f(z)), independently of z € o(T). 
Let B; be squares, with centers \; and diameter less than 6, which partition 0(7); one 
can find slightly smaller closed squares A; C B; and slightly larger open squares 
C; D> B;, such that >; Hx,y(C; \ Aj) < €. Moreover, one can find continuous 
functions h; such that 14, < hj; < 1c, and 5°; hj = 1; for example, let h;(s, t) := 
h(s)h(t) where h(t) = min(1, r d(t, I°)) is a continuous real function with support 
equal to J and taking the value | just inside it. Then (writing pu = ux, y) 


(x, f(T)y) = Do (x, fhi(T)y) & FOD(x, AT) © Df OD UB). 


i 
More rigorously, (it is enough to consider real-valued functions) 


(x, fhi()y) < (FO) + OM(Ci) 
= (fA) + Ou(Bi) + FAD + OCMC) — HB) 
—(x, fhiT)y) < — fA) M(B) + eu Bi) + (FAD — OCB) — HAI) 


“|(x, Fai(T)y) — f Ad M(BA| < ep(Bi) + fv) + €lG(Ci) = HCAi)) 
f(T)y) - DL FOmB =| Dat , fhi(T)y) — >> fd) u(B| 


< > ix, fhi(T)y) — f Aid) uBa)| 
< SFO! + OCA — w(A’D)) + uCBi) 


< (fllce +Oe+€ 


Hence, in the limit « > 0, (x, f(T)y) = Jor) S dpx,y- 
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The map f +> f(T) isa *-morphism from L®°(a(T)) to BCA): Linearity is 
immediate, (f + Ag(T) = Joep) (Ff + Ag) duix,y = f(T) + Ag (7). 
f(T) = f (T)* since 


a Fly = i: figy= / Ff duz,y = &, FOD)x) = (fF 7 )x, y) = , FTI). 


fg(T) = f(T)g(T) follows from 


: dix, p(Tyy = (x, f(T) y) =f f dux.y 
o(T) o(T) 
=> (x, fg(T)y) = | focus ~ / Sf dbx.g(T)y = (x, f(T)g(T)y). oO 


In particular, T = fd oT AdE). This result, and the next one, are often claimed to 
be the pinnacle of the subject of functional analysis. 


Embedding in B(H) 


Theorem 15.48 Gelfand-Naimark 


Every C*-algebra is embedded in B(/), for some Hilbert space H. 


Proof We have already seen that every Banach algebra ¥ is embedded in B(Y) 
(Theorem 13.8); as in the proof of that theorem, we will again denote elements of 
XX by lower-case letters. The main difficulty is that there is no natural inner product 
defined on ¥ or B(#). Rather there are many semi-inner-products, one for each 
GES, (x, yg = Ox"). 

Let Mg := {x : b(x*x) = 0}; it is a closed left-ideal, since for any a € ¥ and 
x eM, thenax € Mg 


0 < d(x*a*ax) < o(x*x)|lall? = 0. 


This allows us to turn 4 /M,, into an inner product space, which can be completed 
to a Hilbert space Hg (Examples 10.7(2)) and 13.5(6). The inner product on ¥/Mg 
is given by 


(x +My, y+ Mo) := o(0*y). 


The *-morphism L : X — B(Hg): For any a € 4, consider the linear map 
defined by La(x+Mg) := ax+Mg on X/Mg; this is well-defined since aM 4 C 
M6. It is continuous with ||La|| < |la|| since, 
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John von Neumann (1903-1957) Originally from Budapest, he 
studied in Berlin, under Weyl and Polya, but graduated at 23 
years under Fejér in Budapest with a thesis on ordinal num- 
bers. A young party-going genius, in 1926-30 he defined Hilbert 
spaces axiomatically as foundation for the brand new quantum 
mechanics and generalized the spectral theorem to unbounded 
self-adjoint operators. In the 1930s he went to the Princeton 
Institute, proved the ergodic theorem, and studied rings of op- 
erators and group representations; only turbulent fluid dynam- 
ics proved too hard (it remains unsolved today); in 1944 he 
started game theory, proving the mini-max theorem, then on to 
computers and automata theory. 


Fig. 15.1. von Neumann 


Lax + Me)ll = llax + Moll =Vo(e*a*ax) < Vb(x*x)Ilal] = llallllx + Moll. 


This map extends uniquely to one in B( Hg) (Example 8.9(4)). 
Clearly Lg is linear in a, Lgp = LaLp, and L; = I, but it also preserves the 
involution Lyx = L*, 


(x + Mg, La(y + Mo)) = O(@x*ay) = O((a*x)* y) = (Laxx +My, y+ Mg). 


It remains a *-morphism when extended to B( Hg), by continuity of the adjoint. 


The final Hilbert space: However L need not be 1-1. To remedy this deficiency, 
let H := [|gcg Ho be the Hilbert space of “sequences” x := (xg)ges such that 
Xo € Hg and bes (X¢, Xo) Hy < oo; it has the inner product 


(9) = DU (tes Yo) ny: 
eS 


It is straightforward to show that H is indeed a Hilbert space, by analogy with ¢?. 
Let Jax := (Lax¢)deS, 80 that Jg : H — H is obviously linear, and also 
continuous since 


2 2 2 2 2 2 
axl? = So WLaxgll? < lal? >> xoll? = lal? l?. 
9 7) 


The mapping a +> Jy, ¥ +> B(#) is an algebraic *-morphism, 


(y, Jax) = > (yg, Lax) = >. (Liye, Xo) = Vary, ¥). 
% @ 


Moreover it is 1-1, for if J; = 0 then Laxg = 0 for any xg and ¢ € S, in particular 
a+ Mg = Laq1 = 0. But this means that for all 6 € S,a € My, ie., d(a*a) = 0, 
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and this can only hold when o(a*a) € S(a*a) = 0, so |la||? = |la*a|| = 0 and 
a=0. 


Since every such «-morphism between C*-algebras is isometric, the theorem is 
y rp 8 


proved. Oo 


Exercises 15.49 


1. 


10. 


11. 


0 —2 


11\_ f 0.89 0.45 0.89 0.45 
O01} \ —0.45 0.89 0.45 1.34 J 


Examples of polar decompositions are aes ) = jo : ) € ) and 


. If T is acompact operator in B(H) with singular values \,;, and singular vectors 


/ 


€n, €,, then |T ley = Ann and U : e, + e. 


. The polar decomposition of the right-shift operator in €? is trivial: |R| = J. 


What is it for the left-shift operator? 


. T* = |T\U*, |T| = U*T = T*U, and |T*| = UT* = TU%, since U*U isa 


projection onto im |7| and UU* is a projection onto im T. |||7||| = ||T|l. 


. (a) T isnormal } |T*| = |T|, 


(b) T is positive self-adjoint = T = |T|, 


(c) T is unitary = |T| = J AND T is invertible. 


. If|S| = |T| and T is invertible then ST~! is unitary. 


. When T is compact normal, with polar decomposition T = |T|U = U|T|, then 


U and |T7| are simultaneously diagonalizable, U = P~'e!® P, |T| = P~'DP, 
so that T = P~'De!® P. 


. Adapt the proof of the Polar Decomposition theorem to show that if T*T < S*S 


then the map U : imS — imT, Sx + Tx, is a well-defined operator with 
||U|| < land T = US. 


. Every ideal in B(#7) is a *-ideal since 


TeT > |Tl=UTeET > T=|7T|U* ef. 
Every invertible element T of a C*-algebra can be written uniquely as T = U|T| 
where U is unitary. 


Trace-class Operators: Let Tr := {T € B(A): tr|T| < o©} with norm 
| T ||q, = tr || (Proposition 15.30 and Examples 15.33(6)). 


12 
(a) IT = MIT IZ Ilys. and T € Tr © |T|2 € HS, 
(b) tr(T) is independent of the orthonormal basis, 


(c) |tr(ST)| < |] SIT; in particular ||T'|lz¢5 < IT llr 
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(d) Tr is aclosed x-ideal in B(A), 
(e) Té€Tr & T= AB where A, BE HS, 


(f) tr|T| = >, |An|, where (A,,) are the singular values of T (repeated accord- 
ing to their multiplicities). tr T = >”, An holds when T is normal and A, 
are its eigenvalues. 


GNS construction: When * is represented in B(H), every state ¢ € S(X) is 
associated with a unit vector x € H, such that dy = (x, Jyx). 


Proof The vector in question is x := (xy) yes where x, = 1+ Mg and xy, =0 
otherwise. For every y € ¥, 


Oy) = (1+ Mg. ¥ + Mo) py, = (8 LyX) yy, = Dy Os LyX) py, = Os Syd) yp 
Oe 


Remarks 15.50 


1. 


The Banach algebra axiom ||1|| = 1 is redundant for C*-algebras as it follows 
from ||7*T || = ||T||? (assuming V 4 0). 


. The use of A < B is best avoided: it may either mean A < B but A ¥ B or that 


a(B — A) C JO, oof. 


Hints to Selected Problems 


2.2 (1) Writing a := x —z, b := z — y, and substituting into |a + b| < |a| + |b| 
gives the triangle inequality. 

2.3 (2) (a) da € A, Ab € B, d(a,b) < 2, (b) Ve > 0,da € A, ab € B, d(a,b) 
<e. 

2.14 (5) The two sets have, respectively, the shapes of a diamond, and a square with 
a smaller concentric square removed. 

(9) For example, R \ a. 

2.20 (2) The complement of the set is {x € Q: x? > 2} since /2 is irrational. To 
prove the set is open, one needs to find a small enough € such that 


22 Ge 24° 9a he. 


(6) Try the graph of the exponential function and the x-axis in R*. 

(7) The Cantor set is the intersection of all of these closed intervals. 

(8) First show the set {x € [0, 1] : "**"« < 5} for fixed k is closed. 

(10) The answer to the first question is of course no: all points on a circle are equally 
close to the center; the second is also false e.g. in Z; it is true however in IR? because 
the line joining an interior point to x contains closer points. What properties does 
the metric space need to have for this statement to be true? 

(13) No. Take the subsets A := [—1, 1] and B:=R\{0}inR. 

2.22 (2) Any ball B,(x) will contain a point a of the dense open set A. There will 
therefore be a small ball B.(a) C AM B,(x) which contains a point b € B. 

(3) The complement of the Cantor set is open and dense. 

(5) dU = U \ U contains no balls. 


3.5 (Ic) n/a" = n/(1 +8)" < oh < si 0. 


(le) an = (1+ 2)" = 2450-2) + 90-2) — 2) +--+ gr. So dn41 > an, 
yetdn <2+5+44-:°<24+54+ 54:53. 

(1f) a, > comeans Ve > 0, AN, n>N = ay > €. 

(2) The limits must satisfy x = 2+ ./x and x = 1 + 1/x respectively. 

(3) Eventually, |a,| < a < 1, s0 |a,|" <a”. 
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3.12 (3) See Proposition 7.8 

(4) If x, — x then x € 0 and |x,| > c > 0, so that |1/x, — 1/x| = |x — xn|/|xn| 
|x| > 0. 

(10a) The map f(x) = (cos x, sin x) is a continuous bijective map from [0, 277[ to 
the circle. The inverse map is discontinuous at (—1, 0). 

(10b) Take f to be a constant function, and x, =n. 

(10c) Take f(x) := x? and U := J—1, 1[. Examples of open mappings on R are 
polynomials which have no local maxima/minima. 

(11) (f~!F)° = f7'F® is open. The identity map [0, 1[ — [0, 2] is a continuous 
open mapping whose image is not closed. 

(17) d(x, A)/(d(x, A) + d(x, B)). 

(18) All non-empty open intervals of the type Ja, b[ are homeomorphic to, say, ]0, 1[ 
by stretching and translating. ]0, 1[ is homeomorphic to ]0, oo[ viax > 1/x—1, and 
this in turn, is homeomorphic to R via x +> x + Vx? + 1 (for example). Similarly, 
Ja, oo[ and ]—oo, b[ are homeomorphic to them as well. 

(19) Points { x } are open in N but not in Q. 

4.10 (1) The difference between the nth and mth terms of decimal approximations 
is at most 107 min”), 

(4) The finite number of values have a minimum distance € between them. 

(5) Taking m <n, 


d(Xn, Xm) < aa pt+-:: »+ d(Xm+1, Xm) 
<a(e™! 4.40") 
ac” 
< >0 asmn->o. 
l-c¢ 


Note that 5°, 1/n > oo. 


(6) |d(Xn, Yn) — €(Xm, Ym)| |d(Xn, Yn) — €Yns Xm)| + |d (ems Yn) + d(Xm, Ym) | 


d(Xn, Xm) + dYns Ym) 


IN. IX 


(7) For example, the continuous function f(x) := 1/x, defined on ]0, 1] > [1, [, 
maps the Cauchy sequence (1/n) to the unbounded sequence (7). 

9) Jn 1- Jn = Jat 1/n)'? - 1) = 5 

(11) If { x, } are the values of a Cauchy sequence, and x is a boundary point, then 
there is a subsequence x, — x (by Proposition 3.4). 

(14) Any Cauchy sequence in a discrete metric space must eventually be constant. 
(15) The intersection of the balls can contain at most one point, since r, — 0. In 
fact, if x, — x, then x € B,,[x,] for all n, since the balls are nested. 

(16) First show that f(n) = f(l+---+ 1) =nf (1), then f(m/n) = * f (1). 


4.17(1b) |(x2 — x1) y2 + X1(y2 — yD] < Cy2| + lai + x2) 1 — x2] < 3)x1 — x2], 
(xy + X2) (1 — X2) + (Y2 — yd) < 2|x1 — X21 + ly — Yrl- 
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(5) Let f: X — Y be an equivalence; then every Cauchy sequence (x,) in X 
corresponds to a Cauchy sequence in Y, by uniform continuity and Proposition 4.12. 
Since equivalences are homeomorphisms, (x,) converges precisely when (f(x,)) 
does. So X is complete <=> Y is complete. 

4.21 (2) Repeat the proof of Proposition 2.10, using B 
where a, is an approximation of x. 

(4) Let X be an uncountable set with the discrete metric. Then Bj/2(x), for each 
x € X, form an uncountable collection of disjoint sets. 

5.7 (1) Take X \ { x1 } and X \ { x2 } as the open sets; alternatively take small enough 
balls. For (b), take X \ F; and X \ Fo. 

(2) To show that every subset of Q is disconnected, use the same idea with some 
other irrational. 

(5) Consider the open sets fh 0} and yo 1}. 

(11) Suppose f(a) < f(y); f(x) > f(y) is impossible else there is some z € [a, x] 
such that f(z) = f(y). 

5.12 (2) The metric space is the union of the path images, whose intersection contains 
the fixed point. 

(5) Use Theorem 5.9 with A, := X x {y} and B := {xo} x Y. 

(6) Without loss of generality, take x = 0; then R? \ { x } is connected using the unit 
circle and radial lines te for ¢ > —1 and unit vectors e. 

(8) Otherwise, the interior and exterior of the set would disconnect a component. 
(10a) If a component C has a boundary point a ¢ C, then C U B,.(a) would be a 
strictly larger connected set. 

6.4 (3) If B is bounded, so B € B,(x), then B C B,[x]. 

6.9 (3) From some N onwards, x, € Be(xn); cover the rest of the values x, with 
Be (Xm). 

(4) Let BC UM, Be/a(xi), then BS Uy Bealxil © UpLy Be(xi) (Theo- 
rem 2.19). 

6.22 (6) Suppose d(K, F’) = 0, then there are asymptotic sequences a, € K, by, € F; 
(a,) has a convergent subsequence, and therefore (b,) converges to the same limit. 
But then KN F ZA Q. 

(7) After showing K C B,(r, 0), use the fact that there is a point a € K which has 
maximum distance from (r, 0) less than r. 

(13) The unit sphere is a closed subset of the cube [—1, i". 

(16) X x Y is complete and totally bounded by Proposition 4.7 and Exercise 6.9(1). 
6.27 (1) If fr — f with fr € C(X,R), then f,(x) > f(x) in C, and taking the 
imaginary parts shows that f(x) € R. 

(4) FY) — fa) < FY) — fr® < IFW — FOIFIF@) — fy @OI+ Lfv@) — 
Jtn(y)| < € where N depends on x, and |x — y| < 6, small enough but independent 
of x (Proposition6.17). So f —€ < fy, < f on Bs(x) forn > N. By compactness, 
one N will suffice. 

(5) Convert any binary sequence (of Os and 1s) into a “tent” function in C (R™); there 
are uncountably many such functions and their distance from each other is at least 1. 
(8) (x + al) /2 © x +:1)/2. 


(an) instead of B,(.)(x), 


Tn 
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7.7 (2) Balls look like circles, squares and diamonds in the 2-norm, oo-norm, and 
1-norm respectively. 

(4) Let A := {|a,|}, B := {|b,|}. Then from Note 15 of Sect. 7.1, sup |Aa,| = 
sup |A|A = |A| sup A, sup |an+b,| < sup(A+ B) < sup A+sup B, andifsup A = 0, 
then 0 < |a,| < 0 implying a, = 0 for all n. 

(7) The functions f, := 1,0,1/n] converge to 0 in L'[0, 1] but not in L™[0, 1]. The 
inequality ||x || oc < ||x||~1 remains true for sequences, so convergence in e} implies 
that in £°. 

(8) For r > |x|], x € rC, so Ax € ArC = |ArC, ie., [Axl] < |A|||x]]; but 
then ||x|| < raged If s > |lyll, then x + y € rC+s5C = (r+5)C, hence 
ll + yl < [lel + Ilytl. 

7.14 (3) Let x, y € C; then there are points a,b € C within € of x and y. So any 
point on the line tx + (1 — t)y is also close to a point on the line ta + (1 — t)b which 
lies in C because 


IItx + U — thy —ta— (1 — 1)b|| < tllx — all +d — lly — all <e. 


(4) A convex set C is the union of line segments that start from a fixed point x9 € C, 
then use Theorem 5.9. 

(5) If Ady — xX, dn € A, then a, > x/A (for A 4 0) and x/A € A. Conversely, if 
x EAA, ie., x = da with a, — a, then Aa, > Aa =x andx € AA. 

Similarly, when a, — a, dy, € A, and b, = b, by € B, then a, +b, > a+b, 
soa+b € A+B. An example in Ris A := {n+ 1/n: n = 2,3,...} and 
B:={-n: oe = Ae 2, 

7.20 (1b) py he x= pane Fy xi y xj > Oas N > o, since convergent 
sequences are Cauchy. 

(3) The odd sub-sums a; — (a2 — a3) — (a4 — a5) +--+ are decreasing, and bounded 
below by the increasing even sub-sums (a 1 — a2) + (a3 —a4)+°-°-. 

7.22 (5) Applying the Cauchy test to p : the series n 2” /(2"P) a only 
when p— 1 <0; forp=1,>,2 becomes >> which 
diverges; etc. 

(11) For N large enough ||x; +---+xy —x|| < € as well as 2 ie llxnl| < €. 
So for & large enough that 11,...,n% include 1,..., N, 


n a 


diverges; >- 


nn nn ae n ne 


lItay Hos + xny — xl] S [len te tw — xl +e ULxextrall 


8.10 (3) im R is closed since for Rx, — y, the first components give 0 > yo, so 
y= (0, y1,...) = R(y,...). 

(4) Proof that im T is not closed: Let v, := (1, 1/2,...,1/n,0,0,0,...), then 
Tv, = (1, 1/4,...,1/n?,0,...) converges to (1, 1/4,...) € £! asn — oo since 


[ee] 


i 
0, ...,0, 1/@+1),..Ie = >> = 0 


n=N+1 
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Yet, there is no sequence in ¢! which maps to this sequence since (1, 1/2, 1/3, ...) 
Zu}, 

(5) é,/n — 0 in €! because ||(0,..., 0, 1/n, 0, .- {lg = 1/n > 0, bute, A 0 
since ||(0,...,0,1,0,...)I|g: = 1. 

(6) T=L*-1. 

(9) If x, > x then x, — x > Oand Tx, — Tx = T(x, —x) ~ TO=0. 

(11) (a) If Te; are linearly independent, then e; are linearly independent. So if ST e; 
are linearly independent, so are Te; and dim(im(ST)) < dim(im(T)); the other 
statements follow from im(S7T) C im(S) and im($ + 7) C im(S) + im(T). 


(b) If e;,..., ex form a basis for ker 7, extended by e441, ..., €n to a basis for X, 
then Te;,,i =k+1,...,n, forma basis for im T. 
(c) Let Te;,i = 1,...,k, be a basis for ker SM im T. Extend e; with a basis ei for 


ker T. Then STx = Oimplies Tx = 5 a;Te;, hence x = ae Qj e; +2 Bje;. 
(15) (3) |Z] = 1 = || Ril; ©), using £°°, |] S|] = 1, || 7] = 2; (8.4(8)) when Tx = ax 
on €1, [Tl] = |lallgo; (8.6(1)) Il f, ll = 1; (8.6(4)) | Fl] = 1; (8.6(6)) use ‘spike’ 
functions that are zero except near to 0; (12) ||@|| = 1; G4) IIT ]] = 1, |lZal] = 1, 


Mell = Ilglic;- 
(16) Proof for first matrix. Assuming, without loss of generality, that |u| < |A|, 


a r 
| (0 i (;) "=| Ga) |? = APP + la P iy? < APU? + Ly?) 


1 
G: Tx =Ax,s0 |A| < ||T]] < Al. 


(18) Choose unit x, such that ||T,xp || > ||Tn|| — 1/2”. 
8.14 (6) For x = (a;), take the supremum over i of 


so ||Tx|| < |A|||x||. However for x = 


|Tjiai + > Tjjaj| > (Tiillail — > ITij |x|) 
j#i J#i 
2 e|lx|| — (sup |Tji|) lel] — lai) © ellxl- 
I 


(8) If Jy : X; — Xo and Jy : Y; — Y2 are the isomorphisms, then J(T) := 
iy Ie gives the required isomorphism; note that J“'(S)= i SJy. 

8.21 (3b) Show y (0, y) + X x 0 is an isometry. 

(5) Let { a, } be dense in M and {b, + M } dense in X/M. Then { ay, Dm, } is dense 
in X. 

8.25 (5) See the Hilbert cube Exercise 9.10(3). 

(6) Every point x € [[e;,..., eg] is a boundary point (consider x + €e,+1). 

9.4 (2) The functionals on c are y’ (y € €!) and Lim. 

(6) coo C £2°, so £° = co; 1/logn does not belong to any £%°. 

9.7 (1) Let y, := Xn —x € '; then Dat [Ynil < P41 yl < € for some 
N and all n. But |ynt| +--+: + |ynn| > Oasn > 00, so >); |yni| < 2€. 
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9.10 (4) It is enough to show ||x — a||g: < € fora = (ao,...,an,0,...) € coo, N 
large enough. 

9.15 (2) Try |an|?/P e1, 

9.27 (2) Look at the dual spaces of L'[0, 1] and cg to see why they are not isomorphic. 
(6) Write 24- + 2wixé = 3 (x + io7&)? + 10€? to simplify the integral. 


2 
o2 


10.10 (2) In Pythagoras’ theorem, ||y + z||? = |ly||? exactly when z = 0. 
Consider 


2 
2 
DS el? = 1 (ns Xm) SD Mins Xm) SD Mls = (OO Meeall) 
n n,m n 


nm nym 


This can only be an equality when |(xXn, Xm)| = ||Xn|||lXm|| for each n, m. 
(5) Writing x = >), dn Uy, and y = >”, bm Um for a basis v1,...,uy, we find 


(x,y) = > abn (Un, Um) 


nm 


(10) 1, 1, 0,...) and 1, —1, 0, ...) do not satisfy the parallelogram law; write these 
as step functions for L' and L®. 

(12) 7, sin(x) cos(x) dx = 5 Ion sin(2x) dx = [—cos2x]™, = 0, and fo 2x3 — 
xdx = six4 _ x1, = 0. 

(15) Substitute 4 = a + if, then find the minimum by differentiating in a, 6 to get 
A= —(x, y)- 

(16) |lXn — Xml < Wen + Yn — Xm — Ym|| > O since (Xn — Xm, Yn — Ym) = O. 

(17) The ‘inner product’ remains continuous, so Z is closed. 

10.15 (1) Answer 74 (22x0 + 2yo0 — 62z0, x9 + 19yo — 3z0, —6x0 — 3y0 + 2720). 
(2a) Px € Mso Px = hy, andx— Px € M+, so (y, x — Ay) = 0. Expanding gives 
r= (y, x). 

(3) Consider x € M+, and x = a+b wherea € M,b € N;since N C M+ it 
follows that a = 0. 

(5) Any vector x € N can be written x = a+bwherea € M,be M+. Since 
M CN, then b =x —aeé WN as well. 

(6) Letx =a+b,a €M,b € M+; then Tx = Ta+Tb, Ta = Aa € M, 
Tb = Bbhe M-., 


2 Tal? + || TI" 

Pe SP Tal? + (IP 

by Pythagoras’ theorem. But ||Tal| < ||Aljila|l and ||7|| < || BID), so IIT |? < 
t\| All? + Cl — 2)|| Bll, where t = |Ja||?/(\la||? + ||b||7). Now take t = O ort = 1 
depending on which is the maximum of the two. 

(8b) Expand d? < ||x — y||? = 2 — 2Re (x, y) with y = e!v. 

(9c) If |x —a|| = d = ||x —b|| is the shortest distance from x to M, then 
Itx + 1 —ft)x —ta-—(—n)b| =d. 

(9d) The closest sequence would be 1 ¢ co. 
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(10) Pigmsill < WYn+ill < llynll. So [lynll converges. But in general, as Py L 
(y — Py), lly? = IlPyll? + lly — Pyll?, so [lyn — Piynl| > 0, and similarly 
PiYn—P2P1Yy, — 0. In finite dimensions, the bounded sequence y, has a convergent 
subsequence, yn; > y, soy = Pry = P2Pyy, and y is inim P; Nim Pp. 

(11) sinx © 0.955 — 0.304x © —0.20 + 1.91x — 0.88x? + 0.093x3; 1 — x3 = 
1.13 cos x — 0.43 sin x. 

(12c) Answer: a = 2 a p= 2 2I- a 

10.18 (2) Check that Ix" i" H« Satisfies the papaleio wed law, then use the polarization 
identity, noting that (ix)* = —ix*. 

(3) my corresponds to Px. 

(4) The map x +> ((x, )) is a functional so corresponds to some vector Tx. 

10.26 (2) ||T|* = ||T*T || < IT*INT I, so [7 < T*I < IT = ITI. 

(3) For x = (an), y = (bn), Z = (Cn), 


(Z, yX) = SS Enbnan = bare = (YZ, xX) 
n 


n 


(5) fy SVS (x) dx = fy fF s@ FW drdy = fy [! gO FW dy de. 
(8) T*Tx =0 = > 0= (x, T*Tx) = (Tx, Tx). 


(9) Fix a unit vector u € X,A := (Tu,Tu) > 0, and let v be any orthogo- 
nal unit vector; then (Tu, Tv) = (u,v) = 0; similarly, (T(u+v),T(u—v)) = 
(u+v,u—v) = 0, so (Tv,Tv) = A > O constant. For vectors x = au, 


y = Blu + pov, (Tx, Ty) = aBiA = A(x, y). 

(11) Answers: (—5/2, —2/3, 7/6), (—17, —5, 7)/3. 

(15) T'T is the projection onto ker T+; TT" is the projection onto im T. 

(17) V*Vf =Vigis fi fo fi drdx = f/ g(x) dx. 

(18) Answer: r = 0.497m and x /m = 0.0062m7! (the actual values used to generate 
the data were r = 0.5m and «/m = 0.003m~'). 

10.35 (1) Take the inner product of >7,, &nén = 0 with e,,. 

(4) ((én, 0), (0, @m)) = (€n, 0) + (0, 6m) = 0; if x and y can be approximated by 
XN (= Do =1 nen and yw := >"), —1 Bmem respectively, then 


I(x, y) — Cen, yl = Ie — xn, y — yal = V/Ilx — xn ll? + My — yall? 


can be made small; note that (xy, yw) = (xv, 0) + (0, yw) = = An (€n, 0) + 


yn 0, En). 
(5) (x — Xx, @n) = 0 
(6) Suppose e, and Ue, are both orthonormal bases. Then, by Parseval’s identity, 


(Ux, Uy) = > Gn Bm(Uen, Vem) = (x,y). 


nm 


U is onto because y = >), @nUen = UCD, Anen). 
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Conversely, if {e, } is an orthonormal basis for Hj, and y € { Ue, i+. then 0 = 
(y, Uen) = (U*y, en) for all n, so U*y € {en }~ = 0 and |lyl| = ||U*yl| = 0. 
The column vectors of the matrix of U are Ue,, so (Uen, Vem) = (en, Cm) = Snm- 


(8) For example, take 2 (; ; 5 ( A) For the second part, substitute e,, instead of 
x, and deduce orthogonality; if x € { ey }+, then |||] = 0. 

(9) Show x = s+ + Dingo 7 e°"inx then take x = 1/4. Itis interesting to generate 
other series using other points and functions (e.g. |x|, x/|x|, | sin x|). 

(10) For f odd about 1/2, a_, = —a,. In general, every f is the sum of an even 
and an odd function. 

11.7 (4b) Continuity of T(Sx) := Tx: For any v € kerS and y € Y, ||Ty|| = 
[Px] = ||P @ + v)l| < ell Tlllx + vl], then use ||x + ker S|] < cl] Sx]. 

(5) lanl = llonenll < | 2, we: — STF axe; || < cll. 

11.15 (6) If x, € B,(O) then Tx, € T B,(O), so has a Cauchy subsequence, which 
converges. 

11.26 (4) The requirement is d(x, y) = x +Ay, |x +Ay| < |x| + lyl, so [A] < 1. 
(8) +® = 0, so (+®)+ = €'*. Now in the correspondence of ¢!* with 2°, we get 
[®]] = coo and so [®]] = co. 

(9) |bx| = |@(x+a)| < ||| |x + a|| for any a € M; in fact this approaches equality 
for certaina € M,so ||w|| = ||@||. Onto: for any y € (X/M)*, let @x := W(x+M),. 
Hint for the second part: the norm of ||¢ + M+]| = infycy ld + Yl is the same 
as ||| al. 

11.32 (5) (T''x**)6 = x**(T'd) = (T'd)x = OT x. 

(7) If T' is onto, then T is 1-1 by (1) and has a closed image; if T'' is also 1-1, then 
im T is dense, hence T is onto. If T is onto, use the open mapping theorem. 

11.42 (1) For co, a functional is of the type y’ where y = (b,) € ¢!. Now y- en = 
>; bi6ni = bn > 0 as n > of since £' C&%,. 

(2) Use the functional e; - x, = dyj. The converse is true for £?, 1 < p < ~™, 
whose dual space has the Schauder basis e7 ; any  € €?* can be approximated by 


yo bie}. So 


N N N 
oxXn © > Djpei Xn = > biani > > bia; © Px 
i=0 i=0 i=l 


For £!, a functional is of the same type but y € €™. This time y - e, = b, need not 
converge to 0, e.g. y:= 1. 

(4) If 7, — T and 7, — S, then Tx = #Sx for all g and x, so T = S. 

(1b) |6 (Ta Sn — TS)x| < OM|TrlGSn — Sxll + |e UnS — TS)x| > 0. 

(15) If x ¢ M, there is a @ € X* such that éx = 1, PM = 0, so x is not a weak 
limit point of M. More generally, every closed convex set is weakly closed, because 
a hyperplane (so a functional) separates it from any point not in it. 
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12.12(4)|lo(A)|| = | f@ +h) — f(x) — f’(@)All = | [ def (x + th) — f’(x)hde| 


1 
< fire +em- fount 
1 
< Skllall?. 
2 
12.21 (5) Poles and residues are (a) i: 1/2ie, and —i : ie/2; (b) 1: (e, e~!)/3, and 


w : (e®, e-) /3w2, and w? : (e% , e~®)/3a; (c) 0:1. 
13.3 (11) If 7 R is invertible, then P(ST) = 1 = (TR)Q, so T is invertible. 


. fa O 
13.10 (2) Each vector (a, b) corresponds to the matrix ie bY 
(4) 1,A,..., AN cannot be linearly independent, so A” = p(A) must be true for 


some polynomial p. 

(10) This is a generalization of the convolution operation on £!. The proofs are very 
similar to that case Exercise 9.7(2). 

(13) For any ¢, Tx@x = x@Tx, ie., Tx = Axx. So if x, y are linearly dependent 
then Ty = A,y, implying Tx = Ayx and Ay = Ax; if not, then Ay = Ayty = Ax. 
(14d) If $, T € A”, then TR = RT for any R € A’ D A”, including R = S. 

(16) To show Z4 C ZT, let f € Za, and let K be a closed subset of [0, 1] \ A; 
then for any x € K, one can find a function g, € Z such that g,(x) > lina 
neighborhood of x. By compactness of K, a finite number of such functions “cover” 
1 g(x) > 1 
g(x) ga) <1 
a continuous function with h|x = | and belonging to Z (h = gk). By making K 
larger, one can find a sequence of functions such that h,g > g,sog € TZ. 

(17) To show || f +Z,|| = || flall, it is required to find functions g, € Z,4 such 
that || f — gn\| — || flall. This can be done as follows: take B := [0, 1] \ U, where 
U = A+ B,(0), and let A be a function such that h|A = 0, h|B = 1; so fh € Za 
yet f — fh =0on B, and || f — fh\| > || f|All ase > 0. 

(19) Multiplication is well-defined, for if S— S ¢ Z,T —T €T, then ST — ST = 
(S — S)T + S(T _ T) € Z. Associativity and distributivity follow from those of 
X. Suppose ||$ + Ay|| > |S + Z|, ||T + Ball — 7 + Z|, for some A,, B, € Z, 
then 


K,sog := gy, +---+ 8, € Z is greater than | on K. Let h(x) := | 


|ST + Z|] < CS + An) + Br)|l < WS + An + Ball > |S +2117 + ZI 


Finally, ||1 + Z|] < ||1 + 0|| = 1 yet ||1 + Z|| 4 0; but also in any normed algebra 
in which ||ST'|| < ||S||||Z'|| holds, 1 < |||), since |J1|| = 17] < 1)’. 
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(24) X has a basis of two vectors, which can be taken to be 1 = (; and ('). 


Multiplication by 1 acts of course as the identity matrix; if (’ () a (;). then 


0) () = (2) = 6) 0 


9 (1) Answers (a) 0, (b) 1, (c) max(|a|, |b|), (d) (a 4 0) 
al\" _ fa" na"! —_ nfinf/a 
Oey “oO a? J" OL} 


Now ¢ ) (;) = (3): sol < | € |) | < 24 n2/a? (Example7.9(2)). 


Taking the nth root gives (2 + n7/a’)!/2" — 1, so p(T) = |a|. Note how, in this 
case, ||T”|| first increases then decreases to 0. Only (c) has p(T) = ||T||. 

(3) Use the Cauchy inequality for |x + ay| < /1 + |a|2,/|x|2 + ly/?. 

(7) Let R and S be the radii of convergence of >°,, anz” and >), bnz”. Then 
(ant bn)z” =>, nz" +>. bnz" has radius of convergence at least min(R, S). 
>, nbn z" has radius of convergence RS since lim inf |an bp [ae = lim inf |ay, [-tfe 
[Bn|- 4". 

(8) f + g and fg have coefficients a, + by, dgby + aybn_1 +--+ + anbo. 


fog(T) =ay+aig(T) + ag(T) + 
= (ao + abo + anbj +--+) + (ajbj + 2anb, +---)T 
+ (ayby + agb} ++++)T? 


(9) IF) — Deegan Tl = Dye nT" < Dy ys lanl ITI" > O when 
||| < R. 

(14) cosO0 = e® = 1, but cos2 = (cos 1 — sin 1)(cos1 + sin 1) < 0, so there is a 
number 0 < B < 2, cos = 0. Since the conjugate of e!? is e~'®, it follows that 
|e!?| = 1, so sin B = 1; hence e? = i and e*' = 1. 

(17) Expand e®!5e%7 e%37 e%4T to second order, and equate with eS +7 ~ 1+ (5+ 
T)+(S+T)?/2, to get a3 = 1/2; the two values can be chosen to be equal. 
13.25 (2) fg =1 —> f@ = 1/g(t) £0, Vt € [0, 1]. g has a minimum 
distance to the origin Exercise 6.22(10), so f = 1/g is also bounded. 

(4) 77" || = sup, |7~!x|I/[lxl] = sup, IlylI/IT yl. 

(8) 


ef tT — otT+sT — oT OST since (tT)(sT) = (sT)(tT). 
et thyT = ef! ht = eT (1 +hT +o(h)) 
so the derivative at t ise’? T. 


(12) SR = O for S(ao,a1,...) := (ao, 0,...). But ||RT|| = ||T|| for all T, so 
RT, 7% O when T,, are unit elements. 
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13.31 (2) If | f(z)| < elz|"/" < elz|*, then f is still a polynomial. 

(8) If a is a zero or pole of order +N, then (z — ay” J (z) is analytic and non-zero 
at a. Thus qf/p is bounded analytic on C, so must be constant. 

14.7 (2) f(t) —A is not invertible precisely when f (to) — A = 0 for some fg € [0, 1]. 
(4) 7? —22 = (T—z)(T4+2),s0z27 := A € o(T*?) = A=+z € a(T) (one 
of them). Conversely, if T* — z* has an inverse S, then S(T + z)(T —z) =1= 
(T — z)(T +z)S, so T — z is invertible. 

(7) (S, T) — AC, 1) = (S — A, T — A) is not invertible iff S — 2 or T — d is not 
invertible. 

(8) The map T© S—A: (x,y) & (Tx — Ax, Sy — Ay) is invertible exactly when 
T — i and S — 2X are invertible. 

14.13 (1) Rx = Ax means dy = Adn+1, $0 dn = ag/X"; but also 0 = Aag. There are 
no solutions to these algebraic equations. 

(3) €! is embedded in £!(Z), so o(L) decreases from the first case to the second. 
In fact, in ¢! (Z), there are no eigenvalues, because pa |A|” cannot converge 
for any 4. Yet the boundary of o(T) in ¢!, consisting of generalized eigenvalues, is 
preserved in ¢!(Z). 

(4) T'x = (ao, az, a3,...) on £!. 

(5) T'x = (ao, a2, a3/2,...) on £!. 

(9) The operator (T — 2) f(x) = (x — A) f(x) is invertible only when A ¢ [0, 1]. 
There are no eigenvalues because xf (x) = Af (x) for all x implies f = 0. The image 
of T — is asubset of { g € C[O, 1] : g(A) = 0}; as this set is closed and not C[0, 1], 
all A € [0, 1] are residual spectral values. 

(10) Induction on n: Expand VV” f as a double integral and change the order of 
integration. 

(12) 1—|A| < ||Txn — AXn|| > 0;T—-A = T(—AT™—!),so0|A| <1 = > A ¢o(T). 
The boundary of o(T) must be part of the circle. 

(13) T is 1-1 with a closed image <=> ||Tx|| > c|l|x||, so 


(P+ A)x|| 2 ||Txl| — | Axl 2 Ce — AID [ell 


shows T is an interior point of the set. 

14.21 (7) The eigenvalue equation for ML is ay41 = nian, $0 ay = n!A" xq > CO. 
For RM, {0} =0,((RM)') Co,(RM). 

14.29 (3) e° 7) = o(e7) = o(1) = {1}, soo (T) C 27iZ. For an idempotent P, 
erP 14 PQni+ SO 4...) 14 PM —1)=1. 

14.40 (11) C% is generated by e;, where eje; =O wheni # j, and e;e; = 1. So 
a character satisfies de;de; = 0 and de; = +1. If de; = +1, say, then de; = 0 for 
i #1. In fact 1 = 6(1) = >’; 6(e;) = 6(e1). 


sé 10 01 00 00 
(12) B(C*) is generated by ({ 3: ( ) (; D: and (; ') A character 6 maps 


them to w), ..., wa, which must satisfy wy; = 0, w3 = 0, w2W3 = W1, W3W2 = wWa4, 


for which there are no non-zero solutions. 
(15) ¥ acts on the N points in A as (6;x) = (xj) = x. 
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(17) In the commutative Banach algebra Y := { S, T }”, the spectra remain the same, 
oy(A) = o(A), so A(A) = o(A) and the inclusions follow from A(S + T) © 
A(S) + A(T) and A(ST) C ACS)A(T). 

15.3 (4) T is left- and right-invertible: TT*R = 1 = R/T*T. 

(7) Use Theorem 13.9; note that L*L = a € R, soa = 22. 

15.11 (5) What is meant is that if T € ¥ is normal, and J is a *-morphism, then 
J(T) € Y is also normal, etc. 

(8) The inverse of T, is T_g, which is the adjoint: 


[e@rserex = [ ee) Fe —ayax = [eeFaroa= [ TaeOsrour. 


(20) rey" "= |\CA* A)” + (B*BY" YA" < AIP" + BP)" > 
max(|| Alf, || Bll) 

(22) If T*T is idempotent, then o(TT*) C o(T*T) U{O} C {0,1}. Hence 
o(TT*TT* —TT*) = {0}. 

15.15 (1) (x, TT*x) = ||T*x||? = || Tx||? = (x, T*T x) and use Example 10.7(3). 
(3) || Gx — T*x|| = |\(T, — T)*x|| = I T,x — Tx|| — 0. Conversely, take the limit 
of ||7,*x|| = ||Tnx|| and use Exercise 1. 

(4) JAP |x? = [Ax]? = Ux]? = [xl 

(5) Each distinct eigenvalue comes with an orthogonal eigenvector. In a separable 
space, there can only be a countable number of these. 

(6) (em, T*€n) = (Tem, €n) = Andnm, 80 T*en = on (€m, T*€n)€m = Anén. Then 
show ||T*x|| = ||T xl. 

(8) For (b), note thato (J —T”") = 1—o(T)" C By [1], so||7 — T" || = pU-T") < 
2. For (c) use H = ker(T* — 1) @ ker(T* — 1) = ker(T — J) @ im(T — J). 
15.20 (1) Use Example 10.7(3). 

(3b) Let M, M~ be the domains of A and D. For anyx =a+beM@ Mt, 


(x, Tx) = (a+b,Ta+Tb) = (a, Ta) + (b, Tb), 
(x, x) = (a+b,a+b) = |lall* + |\ol. 


As (a, Ta) = |la||?A with A € W(A), and similarly (b, Tb) = ||b||? uw, w € W(D), 
the values of (x, Tx)/||x||? includes the line between A and jz. The collection of 
these lines is the convex hull of W(A) U W(D). 

al 
(4b) For T := ( - 
because of the condition 1 = ||x||7 = |a|* + |B|?. But @B = cost sint e!? takes the 
value of any complex number in the closed ball By/2[0]. 
(11c) Leta := (T),,s00 =02 =07_, = ||T —All’. 


} let x = (‘). then (x, Tx) = |a|?a+@B + |B/?a =a+aPp, 
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15.25 (1) The singular values are (i) 4 with singular vectors proportional to () ; () : 


2) fi ae A : 1 
and | with C a G) (ii) V3 with (>) ('). and | was ( 0 ) ( i) 
1 —1 


(7) Let S := T/d where A is the largest eigenvalue (in the sense of magnitude); it 
has the same eigenvectors e, as T except with eigenvalues Wy, := Ay,/dr. If vo = 
yn nen + y, where y € ker(T — 4) then Sku = = panes + y. So 


k 2 2k 2 2k 2 
|S*vo — yll? = >> lanl lanl? < c**|voll?, O<e <1) 


k AK kup Skvo aky 
and S*vup > yask — oo. Hence - =) > , and v yo 
0 y AU ITF voll | S¥ voll y/llyll k+l ak yl] > 


the sequence does not converge unless A = |A| but behaves like e!*’y/||y|). 
15.34 (6) Answers: (b) eigenvalues 1/(n + 5)I, eigenvectors sin(n + 5 )IX; (c) 
1/a+ 5)? x?, sin(n + 5) EX; Ne) 


1 1 
7 2 
eae = ff minx. dydx = 1/6. 
ats Dr 0 0 


15.39 (3) If ¢A = 0 for all @ € S and A is self-adjoint, then 0(A) C S(A) = {0} 
and A = 0. 

(7) o(T) © B,[0], and o (T)~! = o (T~!) C By [0]. 

(8) By the spectral mapping theorem {0} = o(P? —P)= io —-rA:rA€a(P)}, 
sor = 0, 1. 

15.44 (4) Let T = A+iB with A, B self-adjoint. Then A > 0 implies S(T) C 
S(A)+iS(B) C R++iR. Conversely, A = (T+ T*)/2,s0¢A = (6T + @T)/2 = 
Re oT > OforgeS. 

15.49 (4) |T*|? = T|T|U* = TU*U|T|U* = (TU*)?. 

(10) |7| is invertible, so let U := T|T|~!; it is unitary, e.g. UU* = T|T|~*T* 
TT '=1. 

(11) (b) T = U|T| = S|T|2, where := U|T|2 € HS, so tr(T) = tr(S|T|7) is 
independent of the basis. 


1 1 
(c)|tr(ST)| = ea = |(U|T|?, S*|T|?)ys| 
< |VITIZllasIS*IT Ills < IS*IITI lus = = (SINT It 
1.2 
(d) The norm axioms are satisfied because ||7||7, = |||T|2 ll 75 and 
tr|S+7|=tU*(S+T) = (U,S)y5 + (U,T) Hg < WSllas + IT Ila: 


Also, 
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T* lq, = tr UT* =trT*U = tr TI. 


(e) |T| = U*T = CB where C := U*A € HS. So tr|T| = (C*,B) < 
Alls ll Blais: 
(f) If en, ef, are the seule vectors of T, then |T|*en = = [Adel 25 Take the polar 


decomposition of (e’, Ten) = e!|(e’,, Ten)|, and let Ven := elne! Then 


ens en ? 


> \(el,, Teén)| = > (En, U* Ten) = tr(U*T) < IT |r, 
n n 


lf Tey = Ané,,< then ||T |p = dp psf en) = Don ns 


Glossary of Symbols 


— Converges to 
— Weak convergence 
ll - ly Norm of space X 
(-,-)x Inner product of space X 
1g¢ Characteristic function on E 
>, A Series of terms 
[ay] Equivalence class of sequence (dy) 


T* Hilbert adjoint of an operator 7, or the involute of an algebra element 


T' Adjoint of an operator T 
Dual of a sequence x 
T Gelfand/Fourier transform of T 
[.A]] Span of vectors in A 
A® Complement of set A 
A’ Commutant algebra of A 
A®° Interior of set A 
A+ Annihilator or orthogonal complement of A 
A Pre-annihilator of A 
X* Dual space of X, or set of conjugates of X 
dA Boundary of set A 
A Closure of set A 
xy Multiplication of sequences 
x -y Dot product of sequences 
x * y Convolution of sequences or functions 
A+ B Addition of sets 
A @® B Direct sum of subspaces 
X =Y Isomorphic spaces 
X =Y Isometric spaces 
X CY X is embedded in Y 
X/M Quotient space of X by M 
B(X) Space B(X, X) 
B(X, Y) Space of continuous linear operators X — Y 
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B,(a) Ball of radius r, center a 
B,[a] Closed ball 
By Unit open ball of X 
c Space of convergent sequences 
co Space of sequences that converge to zero 
C(X) Space Cp(X, C) 
Cp(X, Y) Space of bounded continuous functions f: X — Y 
C"(R, X) Space of n-times continuously differentiable functions 
C°(A) Space of analytic functions on A 
C[x, y] Space of polynomials in x, y 
codim A Codimension of subspace A 
d Distance function 
D_ Differentiation operator 
D(U,Y) Set of differentiable functions 
D, “Taxicab” distance on X x Y 
Deo Max distance on X x Y 
A Character space of V 
6, Dirac functional 
dim X Dimension of space X 
F A field, usually R or C 
G(X) Group of invertibles of V 
I Identity operator 
im 7 Image of a linear map T 
index(7’) Index of a Fredholm operator T 
J Radical of an algebra 
ker T Kernel or null space of a linear map T 
L Left-shift operator 
£P Space of sequences with the p-norm 
L?(A) Space of functions on A with the p-norm 
limyp-so9 Limit as n — oo 
yt Lebesgue measure on RN 
M, Multiplication operator by a 
S(#) State space of an algebra 
R Right-shift operator 
p(T) Spectral radius of T 
o(T) Spectrum of T 
T, Translation by a 
tr(T) Trace of T 
W(T) Numerical range of T 


“Ka 


Further Reading 


Functional analysis impinges upon a wide range of mathematical branches, from 
linear algebra to differential equations, probability, number theory, and optimization, 
to name just a few, as well as such varied applications as financial investment/risk 
theory, bioinformatics, control engineering, quantum physics, etc. 

As an example of how functional analysis techniques can be used to simplify 
classical theorems consider Picard’s theorem for ordinary differential equations. The 
differential equation y' = F(x, y), y(a) = Ya, is equivalent to the integral equation 
y(x) = Ty) = Ya + i F(s, y(s)) ds. It is not hard to show that if F is Lipschitz 
in y and continuous in x, then T is a contraction map on C[a — h, a + h] for some 
h > O, and the Banach fixed point theorem then implies that the equation has a 
unique solution locally. 

However, the classical derivative operator is in many ways inadequate: its domain 
is not complete and it is unbounded on several norms of interest. But there is a way 
to extend differentiation to much larger spaces, namely Sobolev spaces and Dis- 
tributions. The former are Banach spaces L? of functions that have certain grades 
of integrability (~) and differentiability (s), while the latter are spaces of function- 
als that act on them with weak*-convergence. Distributions include all the familiar 
functions in L ib -» but also other ‘singular’ ones, such as Dirac’s delta ‘function’ 5 
and 1/x”. Differentiation can be extended as a continuous operator on these spaces, 
e.g. LP > L _,- Moreover, distributions can be differentiated infinitely many times; 
for example, the derivative of the discontinuous Heaviside function lp+ is 6. But, 
in general, ‘singular’ distributions cannot be multiplied together. A central result 
is the Sobolev inequality, lull zaqg") < Cn,pll Dull teagan), for n > 2, ; = 5 —i, 
which implies that the identity map L? (R”) + L?(R"), along the arrows in Fig. 1, 
is continuous. The study of operators on such generalized spaces is of fundamen- 
tal importance: from extensions of the convolution and the Fourier transform, to 
pseudo-differential operators of the type f(x, D), singular integrals, and various 
other transforms (see [12, 26, 28]). 

Although unbounded, classical differential operators are normal “closed opera- 
tors’: these have a graph {(x, Tx) : x € X} which is closed in X x X. Quite 
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> 
= = 
Zo slope of line 
L Cs 
S 
ze 8 ee equals n 
oF 
a) 
LP integrability = ; 
distributions 


Fig. 1 Sobolev Spaces 


a lot of the spectral theory extends in modified form to them. For example their 
spectrum remains closed but not necessarily bounded. So, if one inverts in a point 
2. ¢ o(T) then (T —A)~! becomes a regular continuous operator, which can often be 
expressed as an integral operator, whose kernel is called its Green’s function. Indeed, 
it turns out that ‘elliptic’ differential operators become Fredholm self-adjoint opera- 
tors under this inversion. This immediately gives certain results, usually falling under 
the heading of Sturm-Liouville theory, such as that the spectrum of the Laplace oper- 
ator —A on a compact shape in R% is an unbounded sequence of isolated positive 
eigenvalues, called the “resonant frequencies” or “harmonics” of the shape. Deeper 
results include the Atiyah-Singer index theorem: the Fredholm index of an elliptic 
differential operator is equal to a certain topological invariant of the domain. 

The concept of a Banach space can be generalized to a topological vector space, 
namely a vector space with a topology that makes its operations continuous. Many 
theorems continue to hold at least for “locally convex topological vector spaces”, 
including the Hahn-Banach theorem, the open mapping theorem, and the uniform 
boundedness theorem. Other important results are Schauder’s fixed point theorem, the 
Krein-Milman theorem, the analytic Fredholm index theorem, and the Hille-Yosida 
theorem. 

Harmonic analysis is the study of general (but usually locally compact) group 
algebras, especially the Fourier transform. The central results are the Pontryagin 
duality theorem, which asserts that the character space of L!(G) is itself a group that 
is ‘dual’ to G, and the Peter-Wey] theorem. von Neumann algebras are *-algebras that 
arise as double commutators of C*-algebras. Equivalently, they are the weakly closed 
subspaces of B(H). The spectral theorem holds for them. There is a lot of theory 
devoted to their structure, and a complete classification is still an open problem. 

One must also include some outstanding conjectures: whether every operator on 
a separable Hilbert space has a non-trivial closed invariant subspace; whether every 
infinite-dimensional Banach space admits a quotient which is infinite-dimensional 
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and separable; Selberg’s conjecture about the first eigenvalue of a specific Laplace- 
Beltrami operator on Maass waveforms; the Hilbert-Pdlya conjecture that the non- 
trivial zeros of the Riemann zeta function are the eigenvalues of some unbounded 
operator 5 +iA with A self-adjoint; etc. 
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